Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter key terms by book/chapter #508

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Enkidu93
Copy link
Collaborator

@Enkidu93 Enkidu93 commented Oct 11, 2024

Addresses #476

Note: This solution is potentially problematic because it only pulls key terms from the first file in the source and target. Previously this wasn't as much of a concern given, but now that there might be multiple sources and targets, we should give more thought to an approach.


This change is Reviewable

@johnml1135
Copy link
Collaborator

There is no top level test to confirm this right now. Can there be at least one test to make sure that the filtering runs all the way through?

Copy link
Collaborator

@johnml1135 johnml1135 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @ddaspit)

Copy link
Collaborator Author

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and added a test.

Reviewable status: 3 of 6 files reviewed, all discussions resolved (waiting on @ddaspit)

Copy link
Collaborator

@johnml1135 johnml1135 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r2, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @ddaspit)

@johnml1135
Copy link
Collaborator

Everything looks good - except that the tests are failing and that it is out of date ...

Copy link
Collaborator

@johnml1135 johnml1135 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment to say changes needed...

Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @ddaspit)

Copy link
Collaborator Author

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right haha. They pass locally with the machine updates.

Comment to say changes needed...

By this do you mean that you want me to add a comment to the filtering lines in preprocessing with a TODO?

Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @ddaspit)

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we perform mixing on the source-side and fallback on the target-side like we do with verses?

Reviewed 1 of 3 files at r1, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @Enkidu93)

@johnml1135
Copy link
Collaborator

Can we finish this? Machine has been released with the changes (if I am correct).

@Enkidu93
Copy link
Collaborator Author

I'm sorry - I totally forgot about this PR. I think I saw that Machine had been merged and didn't realize Serval hadn't.

Could we perform mixing on the source-side and fallback on the target-side like we do with verses?

Reviewed 1 of 3 files at r1, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @Enkidu93)

We definitely can - although it would add some more complexity to the preprocess step. I wasn't sure if that's something we should do though. For example, if you're truly training on two proper sources (not a BT + source) and they both have key terms, would the noise from mixing those be for any reason particularly unhelpful? It seems to me that when it comes to proper nouns, the key terms really just represent transliterations (unless you have a translation that actually translates the meanings of the names - which could happen). The way that versions transliterate names could be systematically very different. (Those Chinese key terms come to mind). Maybe this is no different than the complexity introduced by mixing in the first place. Also, I wasn't sure how often such a thing would really be useful given that the main use-case is BT + source and I imagine BTs won't have project-specific key terms, right?

In any case, I wonder if it makes more sense to close this PR and then add in this work once the preprocess to service toolkit logic PR is through. What do you all think?

@johnml1135
Copy link
Collaborator

I'm fine waiting for the preprocess changes to go through first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants