-
Notifications
You must be signed in to change notification settings - Fork 102
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Create separate files for each deduplication class (#409)
* add changes from #389 Signed-off-by: Sarah Yurick <[email protected]> * add scripts files Signed-off-by: Sarah Yurick <[email protected]> * add changes from #326 Signed-off-by: Sarah Yurick <[email protected]> * run black Signed-off-by: Sarah Yurick <[email protected]> * re add ParallelScoreFilter Signed-off-by: Sarah Yurick <[email protected]> * remove _MapBuckets and _Shuffle from nemo_curator path Signed-off-by: Sarah Yurick <[email protected]> * update api doc Signed-off-by: Sarah Yurick <[email protected]> * add changes from #445 Signed-off-by: Sarah Yurick <[email protected]> * Add changes from #478 Signed-off-by: Sarah Yurick <[email protected]> * final nits Signed-off-by: Sarah Yurick <[email protected]> --------- Signed-off-by: Sarah Yurick <[email protected]>
- Loading branch information
1 parent
7a49ebb
commit d1f3842
Showing
28 changed files
with
2,814 additions
and
2,538 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.