Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Echo translation engine doesn't work for Paratext projects #396

Open
pmachapman opened this issue May 30, 2024 · 9 comments · May be fixed by #512
Open

Echo translation engine doesn't work for Paratext projects #396

pmachapman opened this issue May 30, 2024 · 9 comments · May be fixed by #512
Assignees
Labels
bug Something isn't working

Comments

@pmachapman
Copy link
Collaborator

This is likely low priority.

The echo translation engine only works with text files. As Scripture Forge now uploads Paratext zip files, we are no longer able to use Echo for testing purposes.

In addition, the language checks 'https://qa.serval-api.org/api/v1/translation/engine-types/echo/languages/en always return isNative: false for echo. Should these always return true, as echo supports any language?

@pmachapman pmachapman added the bug Something isn't working label May 30, 2024
@johnml1135 johnml1135 assigned johnml1135 and Enkidu93 and unassigned johnml1135 Jun 3, 2024
@Enkidu93
Copy link
Collaborator

The most straightforward way to accomplish this is to abstract the alignment and inclusion/exclusion of data functionality from NmtPreprocessBuildJob into a separate class and share that with the Echo engine (or have both inherit/implement the same pipeline). This is something that would be more broadly useful, I think. This way the Echo engine could properly reflect the logic in NMT jobs and would be generally more useful for testing. Should we wait on this until the Machine ASP.NET code is moved over to the Serval repo? @ddaspit @johnml1135 What do you guys think? How important is this?

The switch to a default of true is trivial and I think that would make more sense.

@ddaspit
Copy link
Contributor

ddaspit commented Jun 10, 2024

My plan was that once we move the Machine engine to the Serval repo, we could create a library for shared engine functionality. So, yes, we should wait until we move the Machine engine to the Serval repo.

@Enkidu93
Copy link
Collaborator

@johnml1135 Doesn't look like the machine engine has been moved over yet; can we move this back to another release?

@Enkidu93
Copy link
Collaborator

Waiting on #451

@Enkidu93
Copy link
Collaborator

@ddaspit, do you think it would be appropriate to move some of the logic from PreprocessBuildJob into the service toolkit so that the Echo engine can use it? I'm just not sure how to make that work in a way that maintains a separation between the toolkit and Serval/Machine - or is that degree of separation not a priority? I can just duplicate code, but if we want the Echo engine to be a usable testing engine, we'll need to be conscientious about updating the Echo engine as we make updates to preprocessing particularly. It would be nice if they could just share code so we're always certain that we're testing something as realistic as possible.

@ddaspit
Copy link
Contributor

ddaspit commented Oct 14, 2024

Yes, it would be good to move the preprocessing logic to the service toolkit. In what way would it be difficult to main a separation between the toolkit and Serval?

@Enkidu93
Copy link
Collaborator

Yes, it would be good to move the preprocessing logic to the service toolkit. In what way would it be difficult to main a separation between the toolkit and Serval?

OK. I guess more the machine engine than Serval - I'm thinking that it'd be difficult in regard to the machine engine corpus data models. Is it OK to have those data models be shared and if so, where should they live? I could also just move as much logic as possible to the toolkit without sharing those data models (e.g., things like the AlignScripture functions), but that would still leave a good bit of logic for the Echo engine to have to duplicate.

@ddaspit
Copy link
Contributor

ddaspit commented Oct 14, 2024

I believe that those models are not database models. Is that correct? If I remember correctly, they are basically models that are used to pass the configuration to the preprocessing logic, so it would make sense to move them to the toolkit as config models.

@Enkidu93
Copy link
Collaborator

I believe that those models are not database models. Is that correct? If I remember correctly, they are basically models that are used to pass the configuration to the preprocessing logic, so it would make sense to move them to the toolkit as config models.

No, they're not; that's correct. OK. And then we can have the machine engine import those models 👍 .

@Enkidu93 Enkidu93 linked a pull request Oct 16, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: 🔖 Ready
Development

Successfully merging a pull request may close this issue.

4 participants