Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

migrate to core v3 #117

Draft
wants to merge 10 commits into
base: master
Choose a base branch
from
Draft

migrate to core v3 #117

wants to merge 10 commits into from

Conversation

bertsky
Copy link
Contributor

@bertsky bertsky commented Sep 16, 2024

Still draft as long as v3 is beta / RC, but we can use the CI and already discuss the changes (esp. to tests).

@kba this closely resembles tests on OCR-D/ocrd_kraken#44 (covering variants with METS Caching and/or METS Server and/or parallel pages).

@bertsky
Copy link
Contributor Author

bertsky commented Sep 16, 2024

Oh, and this is based on #116, since I often cannot even run Calamari without that.

@bertsky
Copy link
Contributor Author

bertsky commented Sep 18, 2024

I wrote a simple script to measure and plot the GPU utilisation.

ocrd-calamari-cuda-b32

The rather simple modification 9611e2c (which I will cherry pick into #116 for core v2) helps in two ways: it utilises the GPU better (because it avoids too small batches when regions have but a few lines) and thus also allows increasing the batch size without causing OOM:

Unfortunately, fb2a680 does not accomplish what I expected – reducing the peaky GPU utilization behaviour due to GPU waiting for CPU and vice versa.

Here's a log for batch_size=64 without parallel page threads (but with METS Server) – i.e. before fb2a680:
ocrd-calamari-cuda-b64-B64-v3-p1
And the same with 3 parallel page threads – still before fb2a680:
ocrd-calamari-cuda-b64-B64-v3-p3
Now, after adding fb2a680 with ThreadPoolExecutor computing the predict_raw batches concurrently (shared across parallel page threads):
ocrd-calamari-cuda-b64-B64-v3-p3-bg

Thus, surprisingly, the timeline still shows low average utilisation with lots of waiting time. This is also reflected by wall time and CPU time measurements (see below).

It gets a little better if I split up the batches for the background thread (instead of having Calamari v1 do the batching), though:
ocrd-calamari-cuda-b64-B64-v3-p3-bgbatched

Also, it helps to increase the number of parallel page threads from 3 to 6 – but just relatively, not regarding bg threading. Here's with 6 threads before fb2a680:
ocrd-calamari-cuda-b64-B64-v3-p6
And this is with 6 threads after fb2a680:
ocrd-calamari-cuda-b64-B64-v3-p6-bg

I also tried with more than 1 background thread (number of workers in the shared ThreadPoolExecutor), but that does not do much better either – the above with 2 "GPU" threads:
ocrd-calamari-cuda-b64-B64-v3-p6-bg2
And the same with 4 bg threads:
ocrd-calamari-cuda-b64-B64-v3-p6-bg4

Increasing the number of parallel page threads to 12 or 24 becomes more inefficent still.

Figures for a book with 180 pages of Fraktur:

commit OCRD_MAX_PARALLEL_PAGES wall time CPU time
bf755a3 (region-level batches) 1 1148s 1082s
bf755a3 (region-level batches) 3 744s 1188s
9611e2c (page-level batches) 1 1113s 1042s
9611e2c (page-level batches) 3 698s 1105s
fb2a680 (in 1 background thread) 3 709s 1122s
fb2a680 (in 1 background thread) 6 665s 1178s
fb2a680 (in 1 background thread) 12 693s 1205s
fb2a680 (in 2 background threads) 6 660s 1169s
fb2a680 (in 4 background threads) 6 653s 1160s

Perhaps we must go for Calamari 2 with its efficient tfaip pipelining...

@codecov-commenter
Copy link

codecov-commenter commented Sep 18, 2024

Codecov Report

Attention: Patch coverage is 85.38462% with 19 lines in your changes missing coverage. Please review.

Project coverage is 68.93%. Comparing base (4adf09f) to head (e68ce5f).

Files with missing lines Patch % Lines
ocrd_calamari/recognize.py 85.38% 9 Missing and 10 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #117      +/-   ##
==========================================
- Coverage   71.07%   68.93%   -2.15%     
==========================================
  Files           5        4       -1     
  Lines         204      206       +2     
  Branches       50       55       +5     
==========================================
- Hits          145      142       -3     
- Misses         48       51       +3     
- Partials       11       13       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@bertsky bertsky mentioned this pull request Sep 23, 2024
@mikegerber mikegerber self-assigned this Oct 7, 2024
@mikegerber
Copy link
Collaborator

When will OCR-D 3 be released? (Roughly)

@bertsky
Copy link
Contributor Author

bertsky commented Oct 8, 2024

When will OCR-D 3 be released? (Roughly)

Hard to tell. It hinges on a couple of open design decisions yet to be made in OCR-D/core#1240. Plus (likely) a switch from ThreadPoolExecutor to ProcessPoolExecutor for page-parallel. Plus bertsky/core#21.

I did quite a bit of testing against various workflows with processors (both v3-migrated and old), but we should formalise this into an integration test (probably Quiver)...

Something to be discussed in next Tech Call?

process = Process(target=_start_mets_server,
kwargs={'workspace': workspace, 'url': 'mets.sock'})
process.start()
sleep(1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there no better way than this sleep?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just for the test, you know. 1sec does not hurt considering how long the tests run.

'//page:TextLine/page:TextEquiv[1]/page:Unicode/text()', namespaces=NS)
assert len(text1_out) == len(text1), "not all lines have been recognized"
assert "verſchuldeten" in "\n".join(text1_out), "result for first page is inaccurate"
assert "\n".join(text1_out) != "\n".join(text1), "result is suspiciously identical to GT"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not remove the GT text in the fixture? If that is done in a robust way, it seems like the cleanest way to make sure we're actually testing for OCR text, not the original GT text.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because then we could not compare it, and we would not receive the warnings on the log which we are asserting above.


CONFIGS = ['', 'pageparallel', 'metscache', 'pageparallel+metscache']

@pytest.fixture(params=CONFIGS)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was a bit confused by those configs being tied into the workspace fixture, but they seem to be connected in the OCR-D API (mets_server_url in the Workspace class), so this probably ok?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, they are factually tied because the workspace references the METS Server if present.

process.start()
sleep(1)
workspace = Workspace(resolver, directory, mets_server_url='mets.sock')
yield {'workspace': workspace, 'mets_server_url': 'mets.sock'}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is mets_server_url redundant here, the Workspace seems to have it too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, run_processor only needs mets_server_url if there is no workspace. So we could actually simplify here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants