Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

workspace.save_image_file: set dpi on image.save, #343 #611

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

kba
Copy link
Member

@kba kba commented Sep 21, 2020

No description provided.

@kba kba requested a review from bertsky September 21, 2020 16:16
ocrd/ocrd/workspace.py Outdated Show resolved Hide resolved
ocrd/ocrd/workspace.py Show resolved Hide resolved
ocrd/ocrd/workspace.py Show resolved Hide resolved
Copy link
Collaborator

@bertsky bertsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should move this forward.

Looking up any existing image for DPI info (without making sure it's the original image) will work, as long as images only ever enter via this function (or some other DPI-preserving method).

Regarding (in)efficiency, we should immediately take action on the processor side, so the dpi kwarg is used everywhere.

(BTW, if we split this into the opt-in / dpi-kwarg and fallback / find_files part, we will get a window where we can adapt processors without paying the costs already.)

dpi = (dpi, dpi)
elif not dpi:
# TODO - brittle, will this always find the original image?
orig_img_file = self.mets.find_files(pageId=page_id, mimetype='//^image')[0]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We must adapt for the generator now, I believe.

Suggested change
orig_img_file = self.mets.find_files(pageId=page_id, mimetype='//^image')[0]
orig_img_file = next(self.mets.find_files(pageId=page_id, mimetype='//^image', local_only=True))

@bertsky
Copy link
Collaborator

bertsky commented Feb 17, 2023

Another idea I had: couldn't we at least try to force PIL.Image objects we ourselves produce (Workspace.image_from_page and image_from_segment) to retain or piggy-back the DPI info? Or even store that info in the workspace instance (e.g. by attaching the OcrdExif to the OcrdFile of the PAGE-XML, which itself could be cached in the OcrdMets; this would require rewriting Processor.input_files, but allow reusing these iterations)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants