-
Notifications
You must be signed in to change notification settings - Fork 593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ValueError: NLP engine 'transformers' is not available. #1239
Comments
transformers doesn't come with the vanilla Presidio installation. Have you installed it with the [transformers] extra? pip install "presidio_analyzer[transformers]"
pip install presidio_anonymizer
python -m spacy download en_core_web_sm |
Thanks @omri374. That helped. |
Sorry for reopening the issue. I have one more clarification needed. When we are using transformer model in this way, the model will look for entities in both spacy and transformer models ?. If thats the case is there any chance of conflict in the entity names?. Or is there anything specific i need to do in my code? @omri374 |
The |
I asked this because i got a warning and the output was missing a required entity |
Can you please share a reproducible example? |
Sure.
and my config file is
|
@omri374 Any update on this ? |
Yes I'm on it. Will update soon. |
I think the reason you're missing an entity is not because of this warning, but because of the mapping of the model's entity names to Presidio's. The model outputs USERNAME which isn't in the mapping between the model and the library. To fix it, there are two options:
from presidio_analyzer import AnalyzerEngine
from presidio_analyzer.nlp_engine import TransformersNlpEngine, NerModelConfiguration
model_config = [{"lang_code": "en", "model_name": {
"spacy": "en_core_web_sm", # use a small spaCy model for lemmas, tokens etc.
"transformers": "bigcode/starpii"
}
}]
#bigcode/starpii entity mappings:
mapping = dict(
USERNAME="USERNAME",
EMAIL="EMAIL",
KEY= "KEY",
PASSWORD= "PASSWORD",
IP_ADDRESS: "IP_ADDRESS"
)
ner_model_configuration = NerModelConfiguration(model_to_presidio_entity_mapping=mapping)
nlp_engine = TransformersNlpEngine(models=model_config, ner_model_configuration=ner_model_configuration)
analyzer_engine = AnalyzerEngine(nlp_engine=nlp_engine) The other is to add the requested entities to the transformers recognizer, but it requires a bit of tweaking: # nlp_engine = ... As defined before, just with the default mappings
analyzer_engine = AnalyzerEngine(nlp_engine=nlp_engine)
transformers_rec = [rec for rec in analyzer_engine.registry.recognizers if rec.name == "TransformersRecognizer"][0]
transformers_rec.supported_entities.append("USERNAME")
results = analyzer_engine.analyze(text=text, language="en", return_decision_process=True) This behavior (of not returning undefined entities) is a side effect of #1221 (I think). If you have any suggestions on how to improve the behavior here, please let us know! |
Method 1 seems to be working for me. But i would like to know what does this model_to_presidio_entity_mapping means ? is that means the list of all entities in the transformer model ? @omri374 |
Yes, it is used to translate the entities the model was trained on, to Presidio's. It is needed because there may be different ways to detect the same entity and this way you can achieve alignment. It is also used to be able to filter entities in or out, in a model agnostic way. For example, you could have translated USER_NAME to PERSON to conform with Presidio's built in entities. |
So if there is an entity that the model was trained on and no corresponding entity is there in Presidio, how should be the mapping? |
Like the mapping in my previous example. The supported entities for this model are taken from this mapping. User name, for instance, is not a predefined entity in presidio but with this mapping it is returned. |
ok. is that possible to give the input text as a file to presidio ? @omri374 |
Do you mean the configuration? Yes, through a yaml file: https://microsoft.github.io/presidio/analyzer/nlp_engines/transformers/#creating-a-configuration-file |
Not the configuration. I mean instead of giving a string input to analyse, can we give a .txt or .json file to analyse? @omri374 |
I see. There is some support for json here. It shows some examples of using data frames or json as input. |
thankz @omri374 |
Here is any need for the paid API or this is fully open-source? @omri374 |
@WithlbadKhan there is no paid API for Presidio. Presidio is completely open-source |
And is this possible that we make pipeline for large pdf text do the redaction? @omri374 |
Please see this as a starting point: https://github.com/microsoft/presidio/blob/main/docs/samples/python/example_pdf_annotation.ipynb |
You are the best Thanks @omri374 |
And the one other question please |
@WithIbadKhan, this depends on how you want money to be detected. A good place to start is the tutorial for adding recognizers: https://microsoft.github.io/presidio/tutorial/. For example, you can create a regex pattern to detect a numeric value followed by a money sign. |
Hi Omri Mendels,
Hope you are doing well. I am requesting you that I am stuck in one issue
from 3 days so I will be very thankful if you can help me. I am running the
Presidio Streamlit App. But according to my requirements, it cannot detect
the Money Entity. So it is possible with the existence model entity Can I
add the Custom Money Entity? So how do I add the Money entity with the
Analyze function?
My code is below.
And the regex Pattren for Money Entity is
"regex=r"\b(?:\$|US\$|C\$|A\$|£|€|¥|₹)?\s?\d{1,3}(?:,\d{3})*(?:\.\d{2})?\b"
Thank you for your time.
…On Wed, May 8, 2024 at 1:36 PM Omri Mendels ***@***.***> wrote:
@WithIbadKhan <https://github.com/WithIbadKhan>, this depends on how you
want money to be detected. A good place to start is the tutorial for adding
recognizers: https://microsoft.github.io/presidio/tutorial/. For example,
you can create a regex pattern to detect a numeric value followed by a
money sign.
—
Reply to this email directly, view it on GitHub
<#1239 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A2M6GG3ZLPERFXPDL5NJFADZBHP2DAVCNFSM6AAAAABBDFZ2HOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBQGA2TKMBTGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@WithIbadKhan please take a look at this tutorial: https://microsoft.github.io/presidio/tutorial/02_regex/ |
I am trying to use Transformers based Named Entity Recognition models using the following configuration, Iam getting the following error
ValueError: NLP engine 'transformers' is not available. Make sure you have all required packages installed
what else need to be installed ? I have followed as in the documentation
The text was updated successfully, but these errors were encountered: