diff --git a/docs/analyzer/nlp_engines/transformers.md b/docs/analyzer/nlp_engines/transformers.md index 1beb11121..12f737fda 100644 --- a/docs/analyzer/nlp_engines/transformers.md +++ b/docs/analyzer/nlp_engines/transformers.md @@ -208,6 +208,14 @@ The `ner_model_configuration` section contains the following parameters: - `low_confidence_score_multiplier`: A multiplier to apply to the score of entities with low confidence. - `low_score_entity_names`: A list of entity types to apply the low confidence score multiplier to. + +!!! note "Defining the entity mapping" + To be able to create the `model_to_presidio_entity_mapping` dictionary, it is advised to check which classes the model is able to predict. + This can be found on the huggingface hub site for the model in some cases. In other, one can check the model's `config.json` uner `id2label`. + For example, for `bert-base-NER-uncased`, it can be found here: https://huggingface.co/dslim/bert-base-NER-uncased/blob/main/config.json. + Note that most NER models add a prefix to the class (e.g. `B-PER` for class `PER`). When creating the mapping, do not add the prefix. + + See more information on parameters on the [spacy-huggingface-pipelines Github repo](https://github.com/explosion/spacy-huggingface-pipelines#token-classification). Once created, see [the NLP configuration documentation](../customizing_nlp_models.md#Configure-Presidio-to-use-the-new-model) for more information.