Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCR fails if metadata extractor is running #7544

Open
gabriel-piles opened this issue Dec 12, 2024 · 2 comments
Open

OCR fails if metadata extractor is running #7544

gabriel-piles opened this issue Dec 12, 2024 · 2 comments

Comments

@gabriel-piles
Copy link
Member

When Uwazi receives OCR results with metadata extraction enabled, the following error occurs:

TypeError: Cannot read properties of undefined (reading 'template')
at /uwazi/app/api/suggestions/eventListeners.ts:138:10
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async Promise.all (index 0)
at async EventsBus.emit (uwazi/app/api/eventsbus/EventsBus.ts:19:7)
at async Object.save (uwazi/app/api/files/files.ts:49:7)
at async processPDF (uwazi/app/api/files/processDocument.ts:14:18)
at async processFiles (uwazi/app/api/services/ocr/OcrManager.ts:107:22)
at async uwazi/app/api/services/ocr/OcrManager.ts:138:7
at async Object.processResults (uwazi/app/api/services/ocr/OcrManager.ts:126:3)
at async TaskManager.checkForResults (uwazi/app/api/services/tasksmanager/TaskManager.ts:119:9)
at async Repeater.start (uwazi/app/api/utils/Repeater.js:24:7)

To Reproduce

  1. Start Uwazi with
yarn blank state
export EXTERNAL_SERVICES=true
yarn hot
  1. Activate metadata extractor and ocr flags
  "ocrServiceEnabled": true
  "features": {
    "ocr": {
      "url": "http://localhost:5051"
    },
    "metadataExtraction": {
      "url": "http://localhost:5056"
    },
    "metadata-extraction": true
  },
  1. Start OCR service
git clone https://github.com/huridocs/pdf-document-layout-analysis-async.git
cd pdf-document-layout-analysis-async
make start
  1. Send a PDF to OCR
@RafaPolit
Copy link
Member

@gabriel-piles is this prior to the incorporation of OCR into the other services? Or is this still an issue?

@gabriel-piles
Copy link
Member Author

@RafaPolit yes, it is prior to the OCR server change.

And, it is still an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants