ValueError: NLP engine 'transformers' is not available. #1239

ragesh2000 · 2023-12-26T11:18:16Z

I am trying to use Transformers based Named Entity Recognition models using the following configuration, Iam getting the following error

configuration = {
    "nlp_engine_name": "transformers",
    "models": [{
                    "lang_code": "en",
                    "model_name": {
                        "spacy": "en_core_web_sm",
                        "transformers": "bigcode/starpii",
                    },
                }],
}

ValueError: NLP engine 'transformers' is not available. Make sure you have all required packages installed

what else need to be installed ? I have followed as in the documentation

The text was updated successfully, but these errors were encountered:

omri374 · 2023-12-26T11:59:31Z

transformers doesn't come with the vanilla Presidio installation. Have you installed it with the [transformers] extra?

pip install "presidio_analyzer[transformers]"
pip install presidio_anonymizer
python -m spacy download en_core_web_sm

ragesh2000 · 2023-12-26T12:06:56Z

Thanks @omri374. That helped.

ragesh2000 · 2023-12-27T04:57:46Z

Sorry for reopening the issue. I have one more clarification needed. When we are using transformer model in this way, the model will look for entities in both spacy and transformer models ?. If thats the case is there any chance of conflict in the entity names?. Or is there anything specific i need to do in my code? @omri374

omri374 · 2023-12-27T07:43:42Z

The TransformersNlpEngine replaces the spaCy NER model with a transformers model, so you wouldn't get results from both. If you would like to have them running in parallel, see this issue: #1238. In short, one of them would have to be an NLP engine and the other, a recognizer.

ragesh2000 · 2023-12-27T07:57:38Z

I asked this because i got a warning and the output was missing a required entity
/home/ragesh/miniconda3/envs/presidio/lib/python3.8/site-packages/spacy_huggingface_pipelines/token_classification.py:138: UserWarning: Skipping annotation, {'entity_group': 'USERNAME', 'score': 0.9650769, 'word': ' rageshkr', 'start': 10, 'end': 19} is overlapping or can't be aligned for doc 'my name is rageshkr and iam going to dubai'
why this happening? Is this warning is something related to the parameter model_to_presidio_entity_mapping in config? Iam not sure about the mapping i have to do here

omri374 · 2023-12-27T14:09:01Z

Can you please share a reproducible example?

ragesh2000 · 2023-12-27T14:19:15Z

Sure.

from presidio_analyzer import AnalyzerEngine, RecognizerRegistry
from presidio_analyzer.nlp_engine import NlpEngineProvider

conf_file = '/home/ragesh/Documents/presidio/config.yaml'

provider = NlpEngineProvider(conf_file=conf_file)
nlp_engine = provider.create_engine()
analyzer = AnalyzerEngine(
    nlp_engine=nlp_engine, 
    supported_languages=["en"]
)

results_english = analyzer.analyze(text='my name is rageshkr and iam going to dubai', language="en", return_decision_process=True)

and my config file is

nlp_engine_name: transformers
models:
  -
    lang_code: en
    model_name:
      spacy: en_core_web_sm
      transformers: bigcode/starpii

ner_model_configuration:
  labels_to_ignore:
  - O
  aggregation_strategy: simple # "simple", "first", "average", "max"
  stride: 16
  alignment_mode: strict # "strict", "contract", "expand"

ragesh2000 · 2023-12-28T11:48:10Z

@omri374 Any update on this ?

omri374 · 2023-12-28T14:05:29Z

Yes I'm on it. Will update soon.

omri374 · 2023-12-28T14:17:35Z

I think the reason you're missing an entity is not because of this warning, but because of the mapping of the model's entity names to Presidio's. The model outputs USERNAME which isn't in the mapping between the model and the library.

To fix it, there are two options:

customize the NerModelConfiguration object:

from presidio_analyzer import AnalyzerEngine
from presidio_analyzer.nlp_engine import TransformersNlpEngine, NerModelConfiguration


model_config = [{"lang_code": "en", "model_name": {
    "spacy": "en_core_web_sm",  # use a small spaCy model for lemmas, tokens etc.
    "transformers": "bigcode/starpii"
    }
}]

#bigcode/starpii entity mappings:
mapping = dict(
    USERNAME="USERNAME",
    EMAIL="EMAIL",
    KEY= "KEY",
    PASSWORD= "PASSWORD",
    IP_ADDRESS: "IP_ADDRESS"
)

ner_model_configuration = NerModelConfiguration(model_to_presidio_entity_mapping=mapping)

nlp_engine = TransformersNlpEngine(models=model_config, ner_model_configuration=ner_model_configuration)
analyzer_engine = AnalyzerEngine(nlp_engine=nlp_engine)

The other is to add the requested entities to the transformers recognizer, but it requires a bit of tweaking:

# nlp_engine = ... As defined before, just with the default mappings

analyzer_engine = AnalyzerEngine(nlp_engine=nlp_engine)
transformers_rec = [rec for rec in analyzer_engine.registry.recognizers if rec.name == "TransformersRecognizer"][0]
transformers_rec.supported_entities.append("USERNAME")


results = analyzer_engine.analyze(text=text, language="en", return_decision_process=True)

This behavior (of not returning undefined entities) is a side effect of #1221 (I think). If you have any suggestions on how to improve the behavior here, please let us know!

ragesh2000 · 2023-12-28T16:41:24Z

Method 1 seems to be working for me. But i would like to know what does this model_to_presidio_entity_mapping means ? is that means the list of all entities in the transformer model ? @omri374

omri374 · 2023-12-28T19:17:30Z

Yes, it is used to translate the entities the model was trained on, to Presidio's. It is needed because there may be different ways to detect the same entity and this way you can achieve alignment. It is also used to be able to filter entities in or out, in a model agnostic way.

For example, you could have translated USER_NAME to PERSON to conform with Presidio's built in entities.

ragesh2000 · 2023-12-29T03:46:55Z

So if there is an entity that the model was trained on and no corresponding entity is there in Presidio, how should be the mapping?

omri374 · 2023-12-29T09:55:29Z

Like the mapping in my previous example. The supported entities for this model are taken from this mapping. User name, for instance, is not a predefined entity in presidio but with this mapping it is returned.

ragesh2000 · 2023-12-30T14:18:37Z

ok. is that possible to give the input text as a file to presidio ? @omri374

omri374 · 2023-12-31T09:40:25Z

Do you mean the configuration? Yes, through a yaml file: https://microsoft.github.io/presidio/analyzer/nlp_engines/transformers/#creating-a-configuration-file

ragesh2000 · 2023-12-31T16:26:31Z

Not the configuration. I mean instead of giving a string input to analyse, can we give a .txt or .json file to analyse? @omri374

omri374 · 2023-12-31T19:49:22Z

I see. There is some support for json here. It shows some examples of using data frames or json as input.

ragesh2000 · 2024-01-01T13:51:50Z

thankz @omri374

WithIbadKhan · 2024-05-07T09:16:49Z

Here is any need for the paid API or this is fully open-source? @omri374

omri374 · 2024-05-07T09:29:00Z

@WithlbadKhan there is no paid API for Presidio. Presidio is completely open-source

WithIbadKhan · 2024-05-07T11:31:32Z

And is this possible that we make pipeline for large pdf text do the redaction? @omri374

omri374 · 2024-05-07T12:46:05Z

Please see this as a starting point: https://github.com/microsoft/presidio/blob/main/docs/samples/python/example_pdf_annotation.ipynb

WithIbadKhan · 2024-05-08T06:22:56Z

Please see this as a starting point: https://github.com/microsoft/presidio/blob/main/docs/samples/python/example_pdf_annotation.ipynb

You are the best Thanks @omri374

WithIbadKhan · 2024-05-08T06:44:17Z

And the one other question please
which is for example I want to add the special entity here. for example to find also money so what do I do? Because here doesn't detect Money from the text.

@omri374

omri374 · 2024-05-08T08:36:27Z

@WithIbadKhan, this depends on how you want money to be detected. A good place to start is the tutorial for adding recognizers: https://microsoft.github.io/presidio/tutorial/. For example, you can create a regex pattern to detect a numeric value followed by a money sign.

WithIbadKhan · 2024-05-15T06:50:48Z

Hi Omri Mendels, Hope you are doing well. I am requesting you that I am stuck in one issue from 3 days so I will be very thankful if you can help me. I am running the Presidio Streamlit App. But according to my requirements, it cannot detect the Money Entity. So it is possible with the existence model entity Can I add the Custom Money Entity? So how do I add the Money entity with the Analyze function? My code is below. And the regex Pattren for Money Entity is "regex=r"\b(?:\$|US\$|C\$|A\$|£|€|¥|₹)?\s?\d{1,3}(?:,\d{3})*(?:\.\d{2})?\b" Thank you for your time.

…

On Wed, May 8, 2024 at 1:36 PM Omri Mendels ***@***.***> wrote: @WithIbadKhan <https://github.com/WithIbadKhan>, this depends on how you want money to be detected. A good place to start is the tutorial for adding recognizers: https://microsoft.github.io/presidio/tutorial/. For example, you can create a regex pattern to detect a numeric value followed by a money sign. — Reply to this email directly, view it on GitHub <#1239 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/A2M6GG3ZLPERFXPDL5NJFADZBHP2DAVCNFSM6AAAAABBDFZ2HOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBQGA2TKMBTGA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

omri374 · 2024-05-22T15:16:55Z

@WithIbadKhan please take a look at this tutorial: https://microsoft.github.io/presidio/tutorial/02_regex/

ragesh2000 closed this as completed Dec 26, 2023

ragesh2000 reopened this Dec 27, 2023

ragesh2000 closed this as completed Jan 1, 2024

ragesh2000 mentioned this issue Jan 4, 2024

Tensor size conflict from hf-pipeline #1248

Closed

omri374 mentioned this issue Mar 12, 2024

Is it possible to redact entities not supported by presidio? #1327

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: NLP engine 'transformers' is not available. #1239

ValueError: NLP engine 'transformers' is not available. #1239

ragesh2000 commented Dec 26, 2023 •

edited

Loading

omri374 commented Dec 26, 2023

ragesh2000 commented Dec 26, 2023

ragesh2000 commented Dec 27, 2023

omri374 commented Dec 27, 2023

ragesh2000 commented Dec 27, 2023 •

edited

Loading

omri374 commented Dec 27, 2023

ragesh2000 commented Dec 27, 2023

ragesh2000 commented Dec 28, 2023

omri374 commented Dec 28, 2023

omri374 commented Dec 28, 2023 •

edited

Loading

ragesh2000 commented Dec 28, 2023

omri374 commented Dec 28, 2023 •

edited

Loading

ragesh2000 commented Dec 29, 2023

omri374 commented Dec 29, 2023 •

edited

Loading

ragesh2000 commented Dec 30, 2023

omri374 commented Dec 31, 2023

ragesh2000 commented Dec 31, 2023 •

edited

Loading

omri374 commented Dec 31, 2023

ragesh2000 commented Jan 1, 2024

WithIbadKhan commented May 7, 2024 •

edited

Loading

omri374 commented May 7, 2024

WithIbadKhan commented May 7, 2024

omri374 commented May 7, 2024

WithIbadKhan commented May 8, 2024

WithIbadKhan commented May 8, 2024 •

edited

Loading

omri374 commented May 8, 2024

WithIbadKhan commented May 15, 2024 via email

omri374 commented May 22, 2024

ValueError: NLP engine 'transformers' is not available. #1239

ValueError: NLP engine 'transformers' is not available. #1239

Comments

ragesh2000 commented Dec 26, 2023 • edited Loading

omri374 commented Dec 26, 2023

ragesh2000 commented Dec 26, 2023

ragesh2000 commented Dec 27, 2023

omri374 commented Dec 27, 2023

ragesh2000 commented Dec 27, 2023 • edited Loading

omri374 commented Dec 27, 2023

ragesh2000 commented Dec 27, 2023

ragesh2000 commented Dec 28, 2023

omri374 commented Dec 28, 2023

omri374 commented Dec 28, 2023 • edited Loading

ragesh2000 commented Dec 28, 2023

omri374 commented Dec 28, 2023 • edited Loading

ragesh2000 commented Dec 29, 2023

omri374 commented Dec 29, 2023 • edited Loading

ragesh2000 commented Dec 30, 2023

omri374 commented Dec 31, 2023

ragesh2000 commented Dec 31, 2023 • edited Loading

omri374 commented Dec 31, 2023

ragesh2000 commented Jan 1, 2024

WithIbadKhan commented May 7, 2024 • edited Loading

omri374 commented May 7, 2024

WithIbadKhan commented May 7, 2024

omri374 commented May 7, 2024

WithIbadKhan commented May 8, 2024

WithIbadKhan commented May 8, 2024 • edited Loading

omri374 commented May 8, 2024

WithIbadKhan commented May 15, 2024 via email

omri374 commented May 22, 2024

ragesh2000 commented Dec 26, 2023 •

edited

Loading

ragesh2000 commented Dec 27, 2023 •

edited

Loading

omri374 commented Dec 28, 2023 •

edited

Loading

omri374 commented Dec 28, 2023 •

edited

Loading

omri374 commented Dec 29, 2023 •

edited

Loading

ragesh2000 commented Dec 31, 2023 •

edited

Loading

WithIbadKhan commented May 7, 2024 •

edited

Loading

WithIbadKhan commented May 8, 2024 •

edited

Loading