migrate to core v3 #117

bertsky · 2024-09-16T15:59:36Z

Still draft as long as v3 is beta / RC, but we can use the CI and already discuss the changes (esp. to tests).

@kba this closely resembles tests on OCR-D/ocrd_kraken#44 (covering variants with METS Caching and/or METS Server and/or parallel pages).

bertsky · 2024-09-16T16:00:18Z

Oh, and this is based on #116, since I often cannot even run Calamari without that.

…redictor

bertsky · 2024-09-18T13:03:40Z

I wrote a simple script to measure and plot the GPU utilisation.

The rather simple modification 9611e2c (which I will cherry pick into #116 for core v2) helps in two ways: it utilises the GPU better (because it avoids too small batches when regions have but a few lines) and thus also allows increasing the batch size without causing OOM:

Unfortunately, fb2a680 does not accomplish what I expected – reducing the peaky GPU utilization behaviour due to GPU waiting for CPU and vice versa.

Here's a log for batch_size=64 without parallel page threads (but with METS Server) – i.e. before fb2a680:

And the same with 3 parallel page threads – still before fb2a680:

Now, after adding fb2a680 with ThreadPoolExecutor computing the predict_raw batches concurrently (shared across parallel page threads):

Thus, surprisingly, the timeline still shows low average utilisation with lots of waiting time. This is also reflected by wall time and CPU time measurements (see below).

It gets a little better if I split up the batches for the background thread (instead of having Calamari v1 do the batching), though:

Also, it helps to increase the number of parallel page threads from 3 to 6 – but just relatively, not regarding bg threading. Here's with 6 threads before fb2a680:

And this is with 6 threads after fb2a680:

I also tried with more than 1 background thread (number of workers in the shared ThreadPoolExecutor), but that does not do much better either – the above with 2 "GPU" threads:

And the same with 4 bg threads:

Increasing the number of parallel page threads to 12 or 24 becomes more inefficent still.

Figures for a book with 180 pages of Fraktur:

commit	OCRD_MAX_PARALLEL_PAGES	wall time	CPU time
`bf755a3` (region-level batches)	1	1148s	1082s
`bf755a3` (region-level batches)	3	744s	1188s
`9611e2c` (page-level batches)	1	1113s	1042s
`9611e2c` (page-level batches)	3	698s	1105s
`fb2a680` (in 1 background thread)	3	709s	1122s
`fb2a680` (in 1 background thread)	6	665s	1178s
`fb2a680` (in 1 background thread)	12	693s	1205s
`fb2a680` (in 2 background threads)	6	660s	1169s
`fb2a680` (in 4 background threads)	6	653s	1160s

Perhaps we must go for Calamari 2 with its efficient tfaip pipelining...

codecov-commenter · 2024-09-18T14:01:58Z

Codecov Report

Attention: Patch coverage is 85.38462% with 19 lines in your changes missing coverage. Please review.

Project coverage is 68.93%. Comparing base (4adf09f) to head (e68ce5f).

Files with missing lines	Patch %	Lines
ocrd_calamari/recognize.py	85.38%	9 Missing and 10 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #117      +/-   ##
==========================================
- Coverage   71.07%   68.93%   -2.15%     
==========================================
  Files           5        4       -1     
  Lines         204      206       +2     
  Branches       50       55       +5     
==========================================
- Hits          145      142       -3     
- Misses         48       51       +3     
- Partials       11       13       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

mikegerber · 2024-10-08T08:29:22Z

When will OCR-D 3 be released? (Roughly)

bertsky · 2024-10-08T08:59:40Z

When will OCR-D 3 be released? (Roughly)

Hard to tell. It hinges on a couple of open design decisions yet to be made in OCR-D/core#1240. Plus (likely) a switch from ThreadPoolExecutor to ProcessPoolExecutor for page-parallel. Plus bertsky/core#21.

I did quite a bit of testing against various workflows with processors (both v3-migrated and old), but we should formalise this into an integration test (probably Quiver)...

Something to be discussed in next Tech Call?

mikegerber · 2024-10-08T11:59:44Z

test/conftest.py

+                process = Process(target=_start_mets_server,
+                                  kwargs={'workspace': workspace, 'url': 'mets.sock'})
+                process.start()
+                sleep(1)


Is there no better way than this sleep?

It's just for the test, you know. 1sec does not hurt considering how long the tests run.

mikegerber · 2024-10-08T12:05:17Z

test/test_recognize.py

+        '//page:TextLine/page:TextEquiv[1]/page:Unicode/text()', namespaces=NS)
+    assert len(text1_out) == len(text1), "not all lines have been recognized"
+    assert "verſchuldeten" in "\n".join(text1_out), "result for first page is inaccurate"
+    assert "\n".join(text1_out) != "\n".join(text1), "result is suspiciously identical to GT"


Why not remove the GT text in the fixture? If that is done in a robust way, it seems like the cleanest way to make sure we're actually testing for OCR text, not the original GT text.

Because then we could not compare it, and we would not receive the warnings on the log which we are asserting above.

mikegerber · 2024-10-08T12:08:37Z

test/conftest.py

+
+CONFIGS = ['', 'pageparallel', 'metscache', 'pageparallel+metscache']
+
+@pytest.fixture(params=CONFIGS)


I was a bit confused by those configs being tied into the workspace fixture, but they seem to be connected in the OCR-D API (mets_server_url in the Workspace class), so this probably ok?

Exactly, they are factually tied because the workspace references the METS Server if present.

mikegerber · 2024-10-08T12:09:09Z

test/conftest.py

+                process.start()
+                sleep(1)
+                workspace = Workspace(resolver, directory, mets_server_url='mets.sock')
+                yield {'workspace': workspace, 'mets_server_url': 'mets.sock'}


Is mets_server_url redundant here, the Workspace seems to have it too?

Indeed, run_processor only needs mets_server_url if there is no workspace. So we could actually simplify here.

bertsky added 5 commits September 13, 2024 01:04

split up prediction to avoid overly large batches (causing OOM)

83baf9c

adapt to ocrd>=3.0

bf755a3

make test: no assumption on OCRD resource location

1edd5e7

tests: adapt to v3, overhaul and add caching+threading modes

3333cab

require ocrd 3.0 and calamari-ocr 1.0.7

7aae9bc

bertsky added 3 commits September 17, 2024 18:13

aggregate all lines instead of per region to better utilise batched p…

9611e2c

…redictor

run prediction in bg thread (shared across pages to interleave CPU/GPU)

fb2a680

let GPU memory grow by demand (instead of exclusive reservation)

b9b0e13

bertsky added 2 commits September 18, 2024 13:49

no more need for model fixup

46c2ef6

CI: increase RAM

e68ce5f

bertsky mentioned this pull request Sep 23, 2024

Calamari2 #118

Open

mikegerber self-assigned this Oct 7, 2024

mikegerber reviewed Oct 8, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

migrate to core v3 #117

migrate to core v3 #117

bertsky commented Sep 16, 2024

bertsky commented Sep 16, 2024

bertsky commented Sep 18, 2024

codecov-commenter commented Sep 18, 2024 •

edited

Loading

mikegerber commented Oct 8, 2024

bertsky commented Oct 8, 2024

mikegerber Oct 8, 2024

bertsky Oct 8, 2024

mikegerber Oct 8, 2024

bertsky Oct 8, 2024

mikegerber Oct 8, 2024

bertsky Oct 8, 2024

mikegerber Oct 8, 2024

bertsky Oct 8, 2024


		CONFIGS = ['', 'pageparallel', 'metscache', 'pageparallel+metscache']

		@pytest.fixture(params=CONFIGS)

migrate to core v3 #117

Are you sure you want to change the base?

migrate to core v3 #117

Conversation

bertsky commented Sep 16, 2024

bertsky commented Sep 16, 2024

bertsky commented Sep 18, 2024

codecov-commenter commented Sep 18, 2024 • edited Loading

Codecov Report

mikegerber commented Oct 8, 2024

bertsky commented Oct 8, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented Sep 18, 2024 •

edited

Loading