Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Processor resource discovery #559

Merged
merged 74 commits into from
Jan 26, 2021
Merged

Processor resource discovery #559

merged 74 commits into from
Jan 26, 2021

Conversation

kba
Copy link
Member

@kba kba commented Aug 10, 2020

Implementing OCR-D/spec#163

  • processors have a new method resolve_resource that gets passed a parameter value and tries to find it in one of the locations defined in ocrd_tool: file parameter relative filename resolution order, #160 spec#163
  • core now has a builtin list (extensible via user list) of known resources for processors, e.g. the gt4histocr models for tesseract and calamari
  • new CLI ocrd resmgr that allows browsing the builtin list, listing installed resources and downloading new resources
  • resources that are listed in the builtin list can be referenced by URL or name

@kba kba marked this pull request as draft August 10, 2020 16:31
@codecov-commenter
Copy link

codecov-commenter commented Aug 10, 2020

Codecov Report

Merging #559 into master will decrease coverage by 1.39%.
The diff coverage is 27.02%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #559      +/-   ##
==========================================
- Coverage   84.14%   82.75%   -1.40%     
==========================================
  Files          49       49              
  Lines        2776     2847      +71     
  Branches      541      561      +20     
==========================================
+ Hits         2336     2356      +20     
- Misses        342      393      +51     
  Partials       98       98              
Impacted Files Coverage Δ
ocrd_utils/ocrd_utils/__init__.py 100.00% <ø> (ø)
ocrd_utils/ocrd_utils/os.py 46.42% <12.50%> (-45.24%) ⬇️
ocrd/ocrd/processor/base.py 67.44% <29.72%> (-28.72%) ⬇️
ocrd_utils/ocrd_utils/constants.py 100.00% <100.00%> (ø)
ocrd/ocrd/resolver.py 96.55% <0.00%> (+3.44%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e41ba75...6108b64. Read the comment docs.

@kba kba mentioned this pull request Aug 22, 2020
@kba kba force-pushed the resolve-files branch 2 times, most recently from 67c5633 to 260b168 Compare October 13, 2020 13:51
@codecov-io
Copy link

codecov-io commented Oct 27, 2020

Codecov Report

Merging #559 (2b3cb64) into master (2167c4c) will decrease coverage by 5.40%.
The diff coverage is 30.11%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #559      +/-   ##
==========================================
- Coverage   83.98%   78.57%   -5.41%     
==========================================
  Files          52       55       +3     
  Lines        3109     3440     +331     
  Branches      623      701      +78     
==========================================
+ Hits         2611     2703      +92     
- Misses        364      597     +233     
- Partials      134      140       +6     
Impacted Files Coverage Δ
ocrd/ocrd/decorators/ocrd_cli_options.py 100.00% <ø> (ø)
ocrd/ocrd/processor/helpers.py 80.76% <ø> (ø)
ocrd_utils/ocrd_utils/__init__.py 100.00% <ø> (ø)
ocrd/ocrd/resource_manager.py 17.83% <17.83%> (ø)
ocrd/ocrd/cli/resmgr.py 29.76% <29.76%> (ø)
ocrd/ocrd/processor/base.py 64.92% <30.00%> (-15.49%) ⬇️
ocrd_utils/ocrd_utils/os.py 70.88% <40.00%> (-24.86%) ⬇️
ocrd_utils/ocrd_utils/constants.py 90.47% <75.00%> (-9.53%) ⬇️
ocrd/ocrd/cli/__init__.py 100.00% <100.00%> (ø)
ocrd/ocrd/constants.py 100.00% <100.00%> (ø)
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2167c4c...2b3cb64. Read the comment docs.

@bertsky bertsky self-requested a review January 21, 2021 16:54
Copy link
Collaborator

@bertsky bertsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I could make another wish: It would be useful if list-installed would show the location type registered ressources were found in (data, config, cache, virtualenv), and likewise for unregistered resources (instead of the full path after Found at).

@kba
Copy link
Member Author

kba commented Jan 21, 2021

If I could make another wish: It would be useful if list-installed would show the location type registered ressources were found in (data, config, cache, virtualenv), and likewise for unregistered resources (instead of the full path after Found at).

Sure, that makes sense.

@kba
Copy link
Member Author

kba commented Jan 21, 2021

If I could make another wish: It would be useful if list-installed would show the location type registered ressources were found in (data, config, cache, virtualenv), and likewise for unregistered resources (instead of the full path after Found at).

a3cff9e

$ ocrd resmgr download -n ocrd-tesserocr-recognize https://github.com/tesseract-ocr/tessdata/raw/master/fin.traineddata

$ cat $HOME/.config/ocrd/resources.yml
# OCR-D private resource list (consider sending a PR with your own resources to OCR-D/core)
ocrd-tesserocr-recognize:
- description: Found at virtualenv on 2021-01-21 19:18:30.631387
  name: fin.traineddata
  size: 21140513
  url: https://github.com/tesseract-ocr/tessdata/raw/master/fin.traineddata
  version_range: ???

$ ocrd resmgr list-installed -e ocrd-tesserocr-recognize                                                               
ocrd-tesserocr-recognize
- fin.traineddata @ virtualenv (https://github.com/tesseract-ocr/tessdata/raw/master/fin.traineddata)
  Found at virtualenv on 2021-01-21 19:18:30.63138

@kba
Copy link
Member Author

kba commented Jan 22, 2021

OK, so adapted to what @bersky and I discussed and up-to-date with OCR-D/spec#178 🤞

Copy link
Collaborator

@bertsky bertsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could also import and expose OcrdResourceManager directly from ocrd, so use-cases like ocrd_tesserocr are more accessible.

ocrd/ocrd/resource_manager.py Outdated Show resolved Hide resolved
@kba
Copy link
Member Author

kba commented Jan 25, 2021

Maybe we could also import and expose OcrdResourceManager directly from ocrd, so use-cases like ocrd_tesserocr are more accessible.

a5858ec

I also added a flag --allow-uninstalled to control whether to allow downloading resources for processors that are not installed. Without it ocrd resmgr ocrd-foo /vmlinuz will now cause an error (unless you have ocrd-foo in your $PATH).

Also added ocrd resmgr download '*' which will download all resources for all installed processors (or all resources in the registry if --allow-uninstalled).

I really hope we've got it right this time but I will test some more before merging just to be sure.

@bertsky bertsky self-requested a review January 26, 2021 11:54
@kba kba merged commit 1af2d00 into master Jan 26, 2021
@kba kba deleted the resolve-files branch January 26, 2021 12:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants