You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We ship with the simplistic ocrd process tool for sequential workflows with minimal validation of inputs/outputs/parameters. For more complex workflows and in workspaces with many files, this approach does not scale:
No error handling, graceful or otherwise. A single failure of a single processor on a single image breaks the workflow and leaves inconsistent state behind.
Inefficient, does not make full and/or smart use of available computing resources
So we need a proper workflow engine as a backend, that is being worked on in different contexts. However the implementation, we should specify a common syntax for OCR-D workflows.
How it should be
OCR-D users should be able to model even complex, dynamic workflows with an easy-to-understand and well-defined syntax. It should be easy to share workflows, validate them with OCR-D tooling for consistency.
This is the protocol of our meeting. In short, we agreed that:
All implementation projects agree to use Nextflow.
Therefore, Nextflow scripting language (Groovy) can be used as an exchange format.
OCR-D/core will provide a tool to validate the workflow.
Validating means that the workflows must not contain script task, but only call to ocrd- processors.
We need to check OCR-D/core#652 and decide what to do with it.
OCR-D/core needs to provide a mechanism so that all processors can expose their REST endpoints.
Current situation
We ship with the simplistic
ocrd process
tool for sequential workflows with minimal validation of inputs/outputs/parameters. For more complex workflows and in workspaces with many files, this approach does not scale:So we need a proper workflow engine as a backend, that is being worked on in different contexts. However the implementation, we should specify a common syntax for OCR-D workflows.
How it should be
OCR-D users should be able to model even complex, dynamic workflows with an easy-to-understand and well-defined syntax. It should be easy to share workflows, validate them with OCR-D tooling for consistency.
Requirements list
https://pad.gwdg.de/AosGiphcQoKKIqoRYBqK-A
The text was updated successfully, but these errors were encountered: