Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow format #35

Open
kba opened this issue Jan 31, 2022 · 1 comment
Open

Workflow format #35

kba opened this issue Jan 31, 2022 · 1 comment
Labels

Comments

@kba
Copy link
Member

kba commented Jan 31, 2022

Current situation

We ship with the simplistic ocrd process tool for sequential workflows with minimal validation of inputs/outputs/parameters. For more complex workflows and in workspaces with many files, this approach does not scale:

  • No error handling, graceful or otherwise. A single failure of a single processor on a single image breaks the workflow and leaves inconsistent state behind.
  • no support for runtime dynamic behavior, apart from simple mappings based on XPath or similar
  • Inefficient, does not make full and/or smart use of available computing resources

So we need a proper workflow engine as a backend, that is being worked on in different contexts. However the implementation, we should specify a common syntax for OCR-D workflows.

How it should be

OCR-D users should be able to model even complex, dynamic workflows with an easy-to-understand and well-defined syntax. It should be easy to share workflows, validate them with OCR-D tooling for consistency.

Requirements list

https://pad.gwdg.de/AosGiphcQoKKIqoRYBqK-A

@kba kba added the Epic label Jan 31, 2022
@kba kba transferred this issue from OCR-D/core Jan 31, 2022
@kba kba added Epic and removed Epic labels Feb 1, 2022
@krvoigt
Copy link

krvoigt commented May 10, 2022

This is the protocol of our meeting. In short, we agreed that:

All implementation projects agree to use Nextflow.
Therefore, Nextflow scripting language (Groovy) can be used as an exchange format.
OCR-D/core will provide a tool to validate the workflow.
Validating means that the workflows must not contain script task, but only call to ocrd- processors.
We need to check OCR-D/core#652 and decide what to do with it.
OCR-D/core needs to provide a mechanism so that all processors can expose their REST endpoints.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants