Skip to content

Commit

Permalink
.
Browse files Browse the repository at this point in the history
  • Loading branch information
gokhanercan committed Jan 11, 2025
1 parent bceaf0e commit e386213
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 4 deletions.
6 changes: 3 additions & 3 deletions readme.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# OSimUnr-Generator

This repository provides tools used to automatically generate new instances of **OSimUnr dataset** ([see the paper of the study](#cite)), which contains orthographically similar but semantically unrelated word-pairs.
This repository provides tools used to automatically generate new instances of **OSimUnr dataset** ([see the paper of the study](#cite)), which contains *orthographically similar but semantically unrelated* (OSimUnr) word-pairs.

Here are some word-pair examples from the [dataset repository](https://github.com/gokhanercan/OSimUnr):

Expand All @@ -23,14 +23,14 @@ This repository uses English as the default language, but the codebase is design

## Core Components

- **Genuine OSimUnr Methods**: Includes methods from the study such as the dataset generation pipeline, relatedness classifier, shallow affixation, semantic blacklisting, and root detectors.
- **Genuine OSimUnr Methods**: Includes methods from the study, such as the dataset generation pipeline, relatedness classifier, shallow affixation, semantic blacklisting, and root detectors.
- **Handles Pairing of Words**: Manages the associations and pairing of words.
- **General Utility Functions**: Comprises Logger, Progress, String manipulation, and other utilities.
- **General NLP Functions**: Covers essential NLP functionalities like Dataset, Language, POS (Part of Speech), Tokenizer, and Preprocessor.
- **Subword Level Representations**: Features components like Ngram, Root, SegmentedWord, and Affixes for detailed linguistic analysis.
- **Orthographic Similarity Tools**: Tools to compute various edit distances and overlapping coefficients for assessing text similarity.
- **MorphoLex Shallow Parser**: A specialized parser for morphological analysis.
- **WordNet Wrapper**: Used as a relatedness approximation tool, includes Word-pool, Semantic graph, and root detector.
- **WordNet Wrapper**: Used as a relatedness approximation tool; includes Word-pool, Semantic graph, and root detector.
- **Supports Building Your Own Language-Specific Pipeline**: Encourages the development of customized language pipelines, as detailed in `PipelineProviderBase`.

## Compatibility
Expand Down
3 changes: 2 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
import nltk
nltk.download('wordnet')
nltk.download('wordnet')
nltk.download('omw')

0 comments on commit e386213

Please sign in to comment.