Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sample presidio image evaluation notebook generates error #1251

Open
mpsampat opened this issue Jan 7, 2024 · 3 comments
Open

Sample presidio image evaluation notebook generates error #1251

mpsampat opened this issue Jan 7, 2024 · 3 comments
Labels
bug Something isn't working dicom image-anonymization

Comments

@mpsampat
Copy link

mpsampat commented Jan 7, 2024

Describe the bug
The notebook https://github.com/microsoft/presidio/blob/main/docs/samples/python/example_dicom_redactor_evaluation.ipynb
generates an error and does not provide the evaluation results.
the error is shown belown.

To Reproduce
Steps to reproduce the behavior:

  1. Git clone the repo:
  2. https://github.com/microsoft/presidio.git
  3. Go to the folder: presidio/docs/samples/python
  4. run the jupyter notebook called "example_dicom_redactor_evaluation.ipynb"
  5. the first few cells work fine.
  6. the cell with this code give the error. the code is:
  7. _, eval_results = dicom_engine.eval_dicom_instance(instance, gt_file_of_interest)
  8. the error i get is shown below:
  9. `---------------------------------------------------------------------------
    TypeError Traceback (most recent call last)
    Cell In[9], line 1
    ----> 1 _, eval_results = dicom_engine.eval_dicom_instance(instance, gt_file_of_interest)

File /opt/conda/lib/python3.10/site-packages/presidio_image_redactor/dicom_image_pii_verify_engine.py:175, in DicomImagePiiVerifyEngine.eval_dicom_instance(self, instance, ground_truth, padding_width, tolerance, display_image, use_metadata, ocr_kwargs, ad_hoc_recognizers, **text_analyzer_kwargs)
165 # Verify detected PHI
166 verify_image, ocr_results, analyzer_results = self.verify_dicom_instance(
167 instance,
168 padding_width,
(...)
173 **text_analyzer_kwargs,
174 )
--> 175 formatted_ocr_results = self.bbox_processor.get_bboxes_from_ocr_results(
176 ocr_results
177 )
178 detected_phi = self.bbox_processor.get_bboxes_from_analyzer_results(
179 analyzer_results
180 )
182 # Remove duplicate entities in results

File /opt/conda/lib/python3.10/site-packages/presidio_image_redactor/bbox.py:18, in BboxProcessor.get_bboxes_from_ocr_results(ocr_results)
12 """Get bounding boxes on padded image for all detected words from ocr_results.
13
14 :param ocr_results: Raw results from OCR.
15 :return: Bounding box information per word.
16 """
17 bboxes = []
---> 18 print(ocr_results["text"])
19 for i in range(len(ocr_results["text"])):
20 detected_text = ocr_results["text"][i]

TypeError: list indices must be integers or slices, not str`
Expected behavior

  1. expect to get precision recall as shown in the notebook committed in the repo

Additional context
could you please help provide a workaround for this issue. should i use an older tag of presidio ?

@mpsampat
Copy link
Author

mpsampat commented Jan 8, 2024

This issue also exists for other pages such as creating ground truth files page:
https://microsoft.github.io/presidio/image-redactor/evaluating_dicom_redaction/#creating-ground-truth-files;
the following lines of code generate the error shown below
# Format results for more direct comparison ocr_results_formatted = dicom_engine.bbox_processor.get_bboxes_from_ocr_results(ocr_results) analyzer_results_formatted = dicom_engine.bbox_processor.get_bboxes_from_analyzer_results(analyzer_results)

error observed:


TypeError Traceback (most recent call last)
Cell In[19], line 1
----> 1 ocr_results_formatted = dicom_engine.bbox_processor.get_bboxes_from_ocr_results(ocr_results)
2 analyzer_results_formatted = dicom_engine.bbox_processor.get_bboxes_from_analyzer_results(analyzer_results)

File /opt/conda/lib/python3.10/site-packages/presidio_image_redactor/bbox.py:18, in BboxProcessor.get_bboxes_from_ocr_results(ocr_results)
12 """Get bounding boxes on padded image for all detected words from ocr_results.
13
14 :param ocr_results: Raw results from OCR.
15 :return: Bounding box information per word.
16 """
17 bboxes = []
---> 18 print(ocr_results["text"])
19 for i in range(len(ocr_results["text"])):
20 detected_text = ocr_results["text"][i]

@omri374
Copy link
Contributor

omri374 commented Jan 28, 2024

Thank you @mpsampat, and apologies for the delayed response. We'll look into this.

@omri374 omri374 added bug Something isn't working image-anonymization dicom labels Jan 28, 2024
@gianni-di-noia
Copy link

The method verify_dicom_instance already returns formatted ocr_results . The variable is called ocr_bboxes in the codebase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working dicom image-anonymization
Projects
None yet
Development

No branches or pull requests

3 participants