Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get ocrd-tool.json if processors not installed in processing server? #1034

Open
kba opened this issue Mar 29, 2023 · 9 comments
Open

Comments

@kba
Copy link
Member

kba commented Mar 29, 2023

image

@tdoan2010:

This implementation requires that all supported processors must be installed on the same machine with Processing Server as well, which might not be the case. Maybe after integrating #884, we can send requests to each processor to ask for its information instead.

@bertsky:

I concur – see earlier discussion above.

@MehmedGIT:

Maybe after integrating #884, we can send requests to each processor to ask for its information instead.

The processing worker is not a server anymore to send requests there. I still have no clear idea how to achieve that. The best idea I have found so far is to store the ocrd tool jsons in the DB so the Processing Server can retrieve the information from there.

@kba kba mentioned this issue Mar 29, 2023
@bertsky
Copy link
Collaborator

bertsky commented Mar 29, 2023

Discussion continued as follows:

  • @kba suggested using the static ocrd-all-tool.json files as a stop-gap.
  • @MehmedGIT added Extension to the Processing Server (#974) #1028 to utilise that in the Processing Server
  • @bertsky provided add rule for ocrd-tool-all.json, reduce image size, fix+update modules, fix CUDA ocrd_all#362 as a path to a more automatic generation of ocrd-all-tool.json via CI (as Github build artifact)
  • @bertsky commented

    A central static tool JSON list is of course a solution for now, but generally IMO we must overcome this. The tools actually deployed might change – even dynamically – and new tools may arrive (cf. discussion on fallback queues). We could have the workers respond with their tool JSON immediately after startup, couldn't we?

  • @MehmedGIT replied

    The Processing Server then will know only the tool JSON of the started Processing Workers. In case another worker is started manually and the queue for it is created manually, the Processing Server will not be able to validate it's parameters or return it's ocrd tool when requested.

  • @bertsky agreed

    Oh, right. So for unmanaged queues, we would still need a mechanism to advertise/register the tool.

  • @MehmedGIT added

    This is not a problem when the Processor Servers are available. The Processing Server can then just request each specific Processor Server for the ocrd_tool and cache.

  • @bertsky cautioned

    You mean Processor Servers besides Processor Workers? ... and in the Processing Server model we don't use Processor Servers, do we?

  • @MehmedGIT elaborated

    No, we don't. That's why this tool thing is complicated. There is no way of the Processing Server knowing or receiving the tool directly from the Processing Worker ...
    Another route, a better one, would be to provide "register worker" end point to the Processing Server and pass the ocrd tool there. Once the deployer is separated, the deployer will deploy the Processing Workers and register them to the Processing Server together with their ocrd tool json

  • @bertsky acceded

    Yes, explicit worker registration might be the better cooperation model anyway.
    So I'd say the Processing Worker should respond with its tool JSON when created. If the creator is the Processing Server, then it can update its internal tool JSON cache. If it is some external actor, they must send that tool JSON along with the tool name within some (to be defined) worker registration endpoint. So ultimately, the Processing Server can be extended dynamically, but only via explicit registration.
    (And if we do that, we can as well have that endpoint create the queue itself.)

So we seem to agree that all workers ( / processor queues) should be registered ( / created) centrally on the Processing Server (via endpoint or from configuration at startup), and that new Processing Workers should output their ocrd-tool.json immediately, so that can be used by the registration to store all JSONs in a tool cache dynamically.

@bertsky
Copy link
Collaborator

bertsky commented Mar 29, 2023

BTW I believe for the full Web API including /discovery, we would need central worker registration anyway.

@MehmedGIT
Copy link
Contributor

So we seem to agree that all workers ( / processor queues) should be registered ( / created) centrally on the Processing Server (via endpoint or from configuration at startup)

This will come after #1030. #1030 will already be big to handle that change inside as well.
What I currently have in mind for the near future is:

  • the deployer as a separate network agent
  • the ProcessingServerConfig will rather be called DeployerConfig
  • the Processing Server will know nothing about configurations anymore
  • the DeployerConfig will potentially be extended to be able to deploy Workflow Server and Workspace Servers (in the reference WebAPI impl) as well
  • the Deployer agent will deploy RabbitMQ Server, MongoDB, and Processing Server as a 1st step
  • the Deployer will deploy Processing Workers and Processor Servers as a 2nd step
  • the Deployer will register the deployed agents in step 2 to the Processing Server through an endpoint (separate endpoint for each).
  • the Processing Server will then create the Process Queues (i.e., RabbitMQ Queues) based on the registered Processing Workers
  • if a Process Queue with the same name as the registered worker processor name already exists, no queue will be created

Any other suggestions/modifications?

@MehmedGIT
Copy link
Contributor

Ideas for a bit later in time (not even sure for when):

  • to increase the robustness of the entire network, an Observer agent can be introduced to observe the live status of the deployed agents by pinging them every then and now.
  • in case something went down, try to apply different strategies:
  1. try to redeploy,
  2. inform other network agents so they can block certain endpoints (to not fill the storage with unprocessable requests),
  3. send an e-mail notification
    ...
  • problem 1: the Processing Workers are not servers. However, maybe there is an easy way to find their live status through the RabbitMQ Server since they are registered there as Consumers.
  • problem 2: the Processing Workers and Processor Servers registered through the Processing Server after the deployment stage may be more complicated to register to the Observer.

Disclaimer: Potentially, this will be too time-consuming to implement and cause errors without having good automatic testing mechanisms for the entire network and agents working together.

@MehmedGIT
Copy link
Contributor

BTW I believe for the full Web API including /discovery, we would need central worker registration anyway.

True. We still need to think about how exactly this should happen - i.e., which network agent takes responsibility to handle the central registration. This is currently the Processing Server.

@bertsky
Copy link
Collaborator

bertsky commented Mar 31, 2023

BTW I believe for the full Web API including /discovery, we would need central worker registration anyway.

True. We still need to think about how exactly this should happen - i.e., which network agent takes responsibility to handle the central registration. This is currently the Processing Server.

Yes, it makes most sense there, because the Processing Server is the one that needs to know who to talk to anyway. So via registration it has the ultimate truth on processor_list etc. and could provide its own /discovery, which can be delegated to by the Workflow Server's /discovery.

Deployments should also be backed by the database BTW, in case the PS crashes...

@MehmedGIT
Copy link
Contributor

Yes, it makes most sense there, because the Processing Server is the one that needs to know who to talk to anyway.

For processing, yes. What if the /discovery needs to be extended? Say the client wants to discover available Workspace/Workflow servers. Then the Deployer has the central knowledge of where things were deployed.

Deployments should also be backed by the database BTW, in case the PS crashes...

Agree.

@MehmedGIT
Copy link
Contributor

the DeployerConfig will potentially be extended to be able to deploy Workflow Server and Workspace Servers (in the reference WebAPI impl) as well

the Deployer agent will deploy RabbitMQ Server, MongoDB ...

These are no longer valid... The RabbitMQ Server, MongoDB, Workflow Server, and Workspace Server will be deployed with docker-compose.

@tdoan2010

@tdoan2010
Copy link
Contributor

I don't know how the discussion went to this topic, which is completely not relevant to the title of this issue. But yes, the Processing Server will only be responsible for Processor Servers. The rest must be managed by another way outside the Processing Server.

The final goal is to have a docker-compose file, which can be used to start up all necessary components.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants