Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure AnalyzerEngine from file #1338

Open
omri374 opened this issue Mar 20, 2024 · 2 comments
Open

Configure AnalyzerEngine from file #1338

omri374 opened this issue Mar 20, 2024 · 2 comments

Comments

@omri374
Copy link
Contributor

omri374 commented Mar 20, 2024

Is your feature request related to a problem? Please describe.
In many use-cases, especially around the Docker based option, it is challenging to configure the AnalyzerEngine for a specific scenario. For example, in order to have an API supporting multiple languages, it is required to change the code in the app.py:

self.engine = AnalyzerEngine()

Having a way to configure which initial parameters are used (languages, nlp engine, recognizers, default score etc.) will allow a code-free configuration in both Docker based use-cases and for a more configurable Python pipeline.

Describe the solution you'd like

  • Having a configuration file with all the parameters, and a utility to read it and create the custom AnalyzerEngine instance
  • Add the ability to read this conf file in the Docker file.

Describe alternatives you've considered
An alternative would be documentation of how to change app.py, but code would still have to be changed.

Additional context
Presidio already has several conf file, e.g.:

@GautierT
Copy link

GautierT commented Apr 4, 2024

Hey ! Thanks for this PR.

Can I use it to use an other transformer model ? Like this one : https://huggingface.co/Jean-Baptiste/camembert-ner

I was thinkings about using a conf file yaml like this :

nlp_engine_name: transformers
models:
  -
    lang_code: fr
    model_name:
      spacy: fr_core_news_lg
      transformers: Jean-Baptiste/camembert-ner

ner_model_configuration:
  labels_to_ignore:
  - O
  aggregation_strategy: simple # "simple", "first", "average", "max"
  stride: 16
  alignment_mode: strict # "strict", "contract", "expand"
  model_to_presidio_entity_mapping:
    PER: PERSON
    LOC: LOCATION
    ORG: ORGANIZATION
    AGE: AGE
    ID: ID
    EMAIL: EMAIL
    PATIENT: PERSON
    STAFF: PERSON
    HOSP: ORGANIZATION
    PATORG: ORGANIZATION
    DATE: DATE_TIME
    PHONE: PHONE_NUMBER
    HCW: PERSON
    HOSPITAL: ORGANIZATION

  low_confidence_score_multiplier: 0.4
  low_score_entity_names:
  - ID

Can this work ? Without your PR, fr language never seems available.

Thanks.

@omri374
Copy link
Contributor Author

omri374 commented Apr 8, 2024

Hi @GautierT, are you looking to run this through a REST API?
If no, then you can configure your model using the standard NlpEngineProvider logic, for example see this documentation
If yes, then the only additional change needed is on app.py to pass the NlpEngine into the AnalyzerEngine. Instead of this:

self.engine = AnalyzerEngine()

Have this:

class Server:
    """HTTP Server for calling Presidio Analyzer."""

    def __init__(self):
        fileConfig(Path(Path(__file__).parent, LOGGING_CONF_FILE))
        self.logger = logging.getLogger("presidio-analyzer")
        self.logger.setLevel(os.environ.get("LOG_LEVEL", self.logger.level))
        self.app = Flask(__name__)
        self.logger.info("Starting analyzer engine")
        
        provider = NlpEngineProvider(conf_file=PATH_TO_CONF)
        nlp_engine = provider.create_engine()
        self.engine = AnalyzerEngine(nlp_engine=nlp_engine, supported_languages=["fr"])
        self.logger.info(WELCOME_MESSAGE)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants