From e386213a227a927f7b09dc9c13c2cbcbfb2014cb Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?G=C3=B6khan=20Ercan?= Date: Sat, 11 Jan 2025 23:44:28 +0300 Subject: [PATCH] . --- readme.md | 6 +++--- setup.py | 3 ++- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/readme.md b/readme.md index bd5c174..e8afe20 100644 --- a/readme.md +++ b/readme.md @@ -1,6 +1,6 @@ # OSimUnr-Generator -This repository provides tools used to automatically generate new instances of **OSimUnr dataset** ([see the paper of the study](#cite)), which contains orthographically similar but semantically unrelated word-pairs. +This repository provides tools used to automatically generate new instances of **OSimUnr dataset** ([see the paper of the study](#cite)), which contains *orthographically similar but semantically unrelated* (OSimUnr) word-pairs. Here are some word-pair examples from the [dataset repository](https://github.com/gokhanercan/OSimUnr): @@ -23,14 +23,14 @@ This repository uses English as the default language, but the codebase is design ## Core Components -- **Genuine OSimUnr Methods**: Includes methods from the study such as the dataset generation pipeline, relatedness classifier, shallow affixation, semantic blacklisting, and root detectors. +- **Genuine OSimUnr Methods**: Includes methods from the study, such as the dataset generation pipeline, relatedness classifier, shallow affixation, semantic blacklisting, and root detectors. - **Handles Pairing of Words**: Manages the associations and pairing of words. - **General Utility Functions**: Comprises Logger, Progress, String manipulation, and other utilities. - **General NLP Functions**: Covers essential NLP functionalities like Dataset, Language, POS (Part of Speech), Tokenizer, and Preprocessor. - **Subword Level Representations**: Features components like Ngram, Root, SegmentedWord, and Affixes for detailed linguistic analysis. - **Orthographic Similarity Tools**: Tools to compute various edit distances and overlapping coefficients for assessing text similarity. - **MorphoLex Shallow Parser**: A specialized parser for morphological analysis. -- **WordNet Wrapper**: Used as a relatedness approximation tool, includes Word-pool, Semantic graph, and root detector. +- **WordNet Wrapper**: Used as a relatedness approximation tool; includes Word-pool, Semantic graph, and root detector. - **Supports Building Your Own Language-Specific Pipeline**: Encourages the development of customized language pipelines, as detailed in `PipelineProviderBase`. ## Compatibility diff --git a/setup.py b/setup.py index bc64260..573508b 100644 --- a/setup.py +++ b/setup.py @@ -1,2 +1,3 @@ import nltk -nltk.download('wordnet') \ No newline at end of file +nltk.download('wordnet') +nltk.download('omw') \ No newline at end of file