Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Makefile-test for ocrd-network #407

Draft
wants to merge 8 commits into
base: master
Choose a base branch
from
Draft

Makefile-test for ocrd-network #407

wants to merge 8 commits into from

Conversation

joschrew
Copy link

@joschrew joschrew commented Feb 8, 2024

This adds a test for the ocrd-network with ocrd_all. Inspired by this merged pr.
The tests are run with a makefile-task. To run this tests submodules are needed. Currently it is also based on multiple PR's because the Dockerfiles for the processors have to be modified.

Run the tests:

  • I used a vm with ubuntu 22.04, 8 CPU and 16 GB Mem for making sure it works. Nearly fresh vm, no preinstalled images etc.

clone this PR and fetch necessary submodules
- git clone -b docker-tests https://github.com/OCR-D/ocrd_all.git
- cd ocrd_all
- git submodule update --recursive --init core/ ocrd_olena/ ocrd_anybaseocr/ ocrd_cis/

build the used images the first time:
- docker-compose --file core/tests/network/docker-compose.yml --file tests/network/docker-compose.yml build ocrd-olena-binarize ocrd-anybaseocr-crop ocrd-cis-ocropy-denoise ocrd-cis-ocropy-clip ocrd-cis-ocropy-segment ocrd-cis-ocropy-dewarp
- this should only be necessary once to build the used images
- this takes on my machine about 40 Minutes

run the tests:
make integration-test DOCKER_COMPOSE=docker-compose (or else: make integration-test)

Open questions/tasks:

  • Submodule-PR's should probably be merged before this PR
  • docker-compose could be changed to download processor-images (if made available with publishing images for the processors) instead of building with path to Dockerfiles
  • improve test invocation
  • docker-compose currently uses for the core-test-container, which is used to run the tests, a self created Image because the test are not yet available in the published image (just a PR atm)

@bertsky
Copy link
Collaborator

bertsky commented Feb 9, 2024

  • this has to be done in an external ocrd-core copy (of the PR), it does not work as a submodule IIUC
  • Reason: when building, make assets tries to sync the submodule which does not work when core itself is a submodule
  • this can be skipped when the core-PR is merged and included / available in core image (see open
    tasks below)

you can set or pass NO_UPDATE=1, as documented in the readme and setup guide, to prevent any git submodule actions during make recipes.

@bertsky
Copy link
Collaborator

bertsky commented Feb 9, 2024

  • docker-compose could be changed to download processor-images (if made available with publishing images for the processors) instead of building with path to Dockerfiles

I agree, but ideally the compose file offers both ways, so the user can decide whether to re-build (docker compose build or docker compose up --build) or to download (docker compose pull or docker compose up --pull always). We should finally make all modules publish under Dockerhub or GHCR, as originally proposed here ...

@bertsky
Copy link
Collaborator

bertsky commented Feb 9, 2024

I believe we should try to generate the docker-compose files per module from their respective ocrd-tool.json (or the ocrd-all-tool.json) and then include all sub-compose files into a single top-level one, with some shared definitions, networks and volumes.

Generating could be done via another makefile rule. (But for the registry image names, we would need some additional field in the tool jsons, since it's unlikely these will become fully homogeneous and predictable.)

@bertsky
Copy link
Collaborator

bertsky commented Feb 9, 2024

  • docker-compose currently uses for the core-test-container, which is used to run the tests, a self created Image because the test are not yet available in the published image (just a PR atm)

you mean the ocrd_core_test stage of the core/Dockerfile? I don't think we should publish this as a prebuilt image – we can just clone the repo and build here. Same could be done for test stages of other modules.

@joschrew
Copy link
Author

Thanks for your help/input.

May last commit updated the submodules (-PR's) and now the core image used is build with core/Dockerfile instead of building separately

tests/network/docker-compose.yml Outdated Show resolved Hide resolved
tests/network/docker-compose.yml Outdated Show resolved Hide resolved
tests/network/docker-compose.yml Outdated Show resolved Hide resolved
context: ../../../core
args:
BASE_IMAGE: ubuntu:20.04
SKIP_ASSETS: 1
volumes:
- "${PWD}/tests/network/test-workflow-1.txt:/ocrd-data/assets/ocrd_all-test-workflow.txt"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, it seems like this mountpoint is the only reason to have a definition of ocrd_network_core_test.

If you followed my proposal to integrate the non-dummy workflow directly into core (but skip that test normally), then that could simply be done in the original compose file (i.e. the mountpoint for the non-dummy workflow next to dummy-workflow.txt).

Copy link

@MehmedGIT MehmedGIT Feb 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, it seems like this mountpoint is the only reason to have a definition of ocrd_network_core_test.

This should now be possible since we decided to switch from CircleCI to GitHub Actions. Volume mapping was not possible in CircleCI. Check here. Hence, we had to use other hacks to achieve what we wanted.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MehmedGIT ok I think I understand – but why didn't you just switch to named volumes on CircleCI (and docker cp the required data into the volume from the host VM)?

Also, my point was merely that this config can be done in core alone (instead of a mix of compose files in core and ocrd_all). If the volume works there for the dummy workflow (however the VM is set up), then it should also work for the multimodule workflow (when additionally installing all the other module containers).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but why didn't you just switch to named volumes on CircleCI (and docker cp the required data into the volume from the host VM)?

Potentially that could have worked.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I want to move the workflow to core as proposed, but didn't had the time yet. And I think I will try to volume mount in core too, as proposed

tests/network/docker-compose.yml Outdated Show resolved Hide resolved
@bertsky
Copy link
Collaborator

bertsky commented Feb 14, 2024

I believe we should try to generate the docker-compose files per module from their respective ocrd-tool.json (or the ocrd-all-tool.json) and then include all sub-compose files into a single top-level one, with some shared definitions, networks and volumes.

Generating could be done via another makefile rule. (But for the registry image names, we would need some additional field in the tool jsons, since it's unlikely these will become fully homogeneous and predictable.)

Another problem to address is persistent model storage. In #378 we already established that we need named volumes. These could also be shared via compose file, so all module containers contribute to a single ocrd-resources/ on the host. But (apart from the respective definitions in the ocrd_all/docker-compose.yml) we need to have the respective volume definitions (and XDG_DATA_HOME set) in the Dockerfiles of the modules. So far, there is no convention for that. I suggest we use /models everywhere (by symlinking $XDG_DATA_HOME/ocrd-resources to /models as was recently done in ocrd_all).

@MehmedGIT
Copy link

Another problem to address is persistent model storage. In #378 we already established that we need named volumes. These could also be shared via compose file, so all module containers contribute to a single ocrd-resources/ on the host. But (apart from the respective definitions in the ocrd_all/docker-compose.yml) we need to have the respective volume definitions (and XDG_DATA_HOME set) in the Dockerfiles of the modules. So far, there is no convention for that. I suggest we use /models everywhere (by symlinking $XDG_DATA_HOME/ocrd-resources to /models as was recently done in ocrd_all).

I also find this important since core.ocrd_network and Operandi project heavily depend on a general way to reference ocrd processor models via volume mapping.

.gitmodules Outdated Show resolved Hide resolved
Makefile Outdated
# a mechanism to test if all queues are there but that is not available yet. So sleeping
# here for now
sleep 10
-$(INTEGRATION_TEST_IN_DOCKER) pytest -k 'test_ocrd_all_workflow' -v
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
-$(INTEGRATION_TEST_IN_DOCKER) pytest -k 'test_ocrd_all_workflow' -v
$(INTEGRATION_TEST_IN_DOCKER) pytest -k 'test_ocrd_all_workflow' -v

We want failure of the test command in failure of the make integration-test target, no?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the CI/CD pipeline, yes. However, when testing locally, failed tests leave leftovers. Potentially, we need both versions?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But for that use-case we used to have the convention of passing a variable PYTEST_ARGS where the user could insert things like --continue-on-collection-errors, right?

Copy link
Collaborator

@bertsky bertsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more re-usable Dockerhub deployments now...

tests/network/docker-compose.yml Show resolved Hide resolved
tests/network/docker-compose.yml Show resolved Hide resolved
tests/network/docker-compose.yml Show resolved Hide resolved
tests/network/docker-compose.yml Show resolved Hide resolved
tests/network/docker-compose.yml Show resolved Hide resolved
No longer depends on / extends core's docker-compose. dry vs kiss
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants