-
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filter key terms by book/chapter #508
base: main
Are you sure you want to change the base?
Conversation
There is no top level test to confirm this right now. Can there be at least one test to make sure that the filtering runs all the way through? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @ddaspit)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went ahead and added a test.
Reviewable status: 3 of 6 files reviewed, all discussions resolved (waiting on @ddaspit)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r2, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @ddaspit)
Everything looks good - except that the tests are failing and that it is out of date ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment to say changes needed...
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @ddaspit)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right haha. They pass locally with the machine updates.
Comment to say changes needed...
By this do you mean that you want me to add a comment to the filtering lines in preprocessing with a TODO?
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @ddaspit)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we perform mixing on the source-side and fallback on the target-side like we do with verses?
Reviewed 1 of 3 files at r1, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @Enkidu93)
Can we finish this? Machine has been released with the changes (if I am correct). |
I'm sorry - I totally forgot about this PR. I think I saw that Machine had been merged and didn't realize Serval hadn't.
We definitely can - although it would add some more complexity to the preprocess step. I wasn't sure if that's something we should do though. For example, if you're truly training on two proper sources (not a BT + source) and they both have key terms, would the noise from mixing those be for any reason particularly unhelpful? It seems to me that when it comes to proper nouns, the key terms really just represent transliterations (unless you have a translation that actually translates the meanings of the names - which could happen). The way that versions transliterate names could be systematically very different. (Those Chinese key terms come to mind). Maybe this is no different than the complexity introduced by mixing in the first place. Also, I wasn't sure how often such a thing would really be useful given that the main use-case is BT + source and I imagine BTs won't have project-specific key terms, right? In any case, I wonder if it makes more sense to close this PR and then add in this work once the preprocess to service toolkit logic PR is through. What do you all think? |
I'm fine waiting for the preprocess changes to go through first. |
Addresses #476
Note: This solution is potentially problematic because it only pulls key terms from the first file in the source and target. Previously this wasn't as much of a concern given, but now that there might be multiple sources and targets, we should give more thought to an approach.
This change is