-
-
Notifications
You must be signed in to change notification settings - Fork 3
Alignment: Usage
Alignment experiment folders only require src.txt
and trg.txt
files to be run. A config file will be generated automatically for the experiment, but one can still be created manually to customize the alignment.
Aligns the parallel corpora for the designated experiments.
usage: python -m silnlp.alignment.align [-h] [--aligners [aligner [aligner ...]]]
[--skip-align] [--skip-extract-lexicon]
experiments
Arguments:
Argument | Purpose | Description |
---|---|---|
experiments |
Experiment pattern | The pattern of the experiment subfolders where the configuration files will be generated. The subfolders must be located in the SIL_NLP_DATA_PATH > Alignment > experiments folder. |
--aligners [aligner [aligner ...]] |
List of aligners | List of aligners to use to align each corpus. |
--skip-align |
Skip aligning corpora | Skip aligning corpora. |
--skip-extract-lexicon |
Skip extracting lexicons | Skip extracting lexicons. |
Aligns source Bible to defined set of Bibles.
usage: python -m silnlp.alignment.bulk_align [-h] src_path trg_dir
output_dir [--aligner ALIGNER] [--multiprocess]
Arguments:
Argument | Purpose | Description |
---|---|---|
src_path |
Path to source Bible text | Path to source Bible text. |
trg_dir |
Folder of Bibles to align to | Folder of Bibles to align to. |
output_dir |
Folder to contain Bible alignments | Folder to contain Bible alignments. |
--aligner ALIGNER |
Aligner to use | Aligner to use for extraction. Default is "fast_align". |
--multiprocess |
Use multiple processes | Use multiple processes, that is if the chosen alignement algorithm does not do so already. |
Tests generated alignments against gold standard alignments.
usage: python -m silnlp.alignment.test [-h] [--combine-pattern PATTERN]
[--test-size SIZE] [--books [book [book ...]]] [--by-book]
experiments
Arguments:
Argument | Purpose | Description |
---|---|---|
experiment |
Experiment name | The name of the experiment to test. The experiment name must correspond to a subfolder in the SIL_NLP_DATA_PATH > Alignment > experiments folder. |
--combine-pattern PATTERN |
Combine pattern | Combine pattern. |
--test-size |
Test size | Set the number of verse alignments to test. If test size is greater than the total number of verses, the verses tested will be selected randomly. |
--books [book [book ...]] |
Books to score | Specifies one or more books to be scored. When this option is used, the test tool will generate predictions for the entire target language test set, but provide a score only for the specified book(s). Book must be specified using the 3 character abbreviations from the USFM 3.0 standard (e.g., "GEN" for Genesis) |
--by-book |
Score individual books | In addition to providing an overall score for all the books in the test set, provide individual scores for each book in the test set. If this option is used in combination with the --books option, individual scores are provided for each of the specified books. |
Preprocesses Clear gold standard alignments.
usage: python -m silnlp.alignment.preprocess [-h] experiments
Arguments:
Argument | Purpose | Description |
---|---|---|
experiments |
Experiment pattern | The pattern of the experiment subfolders where the configuration files will be generated. The subfolders must be located in the SIL_NLP_DATA_PATH > Alignment > experiments folder. |
Generates translation model for Clear from an alignment model.
usage: python -m silnlp.alignment.preprocess [-h] --aligner ALIGNER --output PATH experiments
Arguments:
Argument | Purpose | Description |
---|---|---|
experiments |
Experiment pattern | The pattern of the experiment subfolders where the configuration files will be generated. The subfolders must be located in the SIL_NLP_DATA_PATH > Alignment > experiments folder. |
--aligner ALIGNER |
Aligner | Aligner to use. |
--output PATH |
Output directory | Output directory. |
Finds the optimal size for a gold standard.
usage: python -m silnlp.alignment.test_size [-h] [--threshold THRESHOLD]
[--test-size SIZE] [--books [book [book ...]]] experiments
Arguments:
Argument | Purpose | Description |
---|---|---|
experiments |
Experiment pattern | The pattern of the experiment subfolders where the configuration files will be generated. The subfolders must be located in the SIL_NLP_DATA_PATH > Alignment > experiments folder. |
--threshold THRESHOLD |
Similarity threshold | Similarity threshold. |
--test-size |
Test size | Set the number of verse alignments to test. If test size is greater than the total number of verses, the verses tested will be selected randomly. |
--books [book [book ...]] |
Books to score | Specifies one or more books to be scored. When this option is used, the test tool will generate predictions for the entire target language test set, but provide a score only for the specified book(s). Book must be specified using the 3 character abbreviations from the USFM 3.0 standard (e.g., "GEN" for Genesis) |
Visualize similarity of languages/projects.
usage: python -m silnlp.alignment.visualize_similarity [-h] --corpus PATH --metadata PATH
--scores PATH [--image PATH] [--country COUNTRY] [--family FAMILY]
[--aligner ALIGNER] [--recompute] [--graph-type TYPE]
[--data-type TYPE] [--threshold THRESHOLD]
Arguments:
Argument | Purpose | Description |
---|---|---|
--corpus PATH |
The corpus folder | The corpus folder. |
--metadata PATH |
The metadata file | The metadata file. |
--scores PATH |
The similarity scores file | The similarity scores file. |
--image PATH |
The image file | The image file. |
--country COUNTRY |
The country to include | The country to include. |
--family FAMILY |
The language family to include | The language family to include. |
--aligner ALIGNER |
The alignment model | The alignment model. |
--recompute |
Recompute similarity scores | Recompute similarity scores. |
--graph-type |
Type of graph | Type of graph. Can be "tree" or "network". |
--data-type |
Type of data | Type of data. Can be "language" or "project". |
--threshold THRESHOLD |
Similarity threshold | Similarity threshold. |