Skip to content

Details of arguments

Sina Majidian edited this page Oct 19, 2023 · 2 revisions

You can see the details of arguments of the read2tree package by running read2tree -h.

usage: read2tree [-h] [--version] [--output_path OUTPUT_PATH]
                 --standalone_path STANDALONE_PATH [--reads READS [READS ...]]
                 [--read_type READ_TYPE] [--threads THREADS] [--split_reads]
                 [--split_len SPLIT_LEN] [--split_overlap SPLIT_OVERLAP]
                 [--split_min_read_len SPLIT_MIN_READ_LEN] [--sample_reads]
                 [--genome_len GENOME_LEN] [--coverage COVERAGE]
                 [--min_cons_coverage MIN_CONS_COVERAGE]
                 [--dna_reference DNA_REFERENCE] [--sc_threshold SC_THRESHOLD]
                 [--ngmlr_parameters NGMLR_PARAMETERS] [--check_mate_pairing]
                 [--debug] [--sequence_selection_mode SEQUENCE_SELECTION_MODE]
                 [-s SPECIES_NAME] [--tree] [--merge_all_mappings] [-r]
                 [--min_species MIN_SPECIES] [--single_mapping SINGLE_MAPPING]
                 [--ref_folder REF_FOLDER]
                 [--remove_species_mapping REMOVE_SPECIES_MAPPING]
                 [--remove_species_ogs REMOVE_SPECIES_OGS] [--keep_all_ogs]
                 [--ignore_species IGNORE_SPECIES]

read2tree is a pipeline allowing to use read data in combination with an OMA
standalone output run to produce high quality trees.

optional arguments:
  -h, --help            show this help message and exit
  --version             Show programme's version number and exit.
  --output_path OUTPUT_PATH
                        [Default is current directory] Path to output
                        directory.
  --standalone_path STANDALONE_PATH
                        [Default is current directory] Path to the folder where marker genes
                        (i.e. reference orthologous groups) in fasta format are located.
  --reads READS [READS ...]
                        [Default is none] Reads to be mapped to reference. If
                        paired end add separated by space.
  --read_type READ_TYPE
                        [Default is "short" reads] Type of reads to use for
                        mapping, either "short" or "long". Either ngm for short reads or ngmlr for long
                        will be used.
  --threads THREADS     [Default is 1] Number of threads for the mapping using
                        ngm / ngmlr!
  --split_reads         [Default is off] Splits reads as defined by split_len
                        (200) and split_overlap (0) parameters.
  --split_len SPLIT_LEN
                        [Default is 200] Parameter for selection of read split
                        length can only be used in combinationwith with long
                        read option.
  --split_overlap SPLIT_OVERLAP
                        [Default is 0] Reads are split with an overlap defined
                        by this argument.
  --split_min_read_len SPLIT_MIN_READ_LEN
                        [Default is 200] Reads longer than this value are cut
                        into smaller values as defined by --split_len.
  --sample_reads        [Default is off] Splits reads as defined by split_len
                        (200) and split_overlap (0) parameters.
  --genome_len GENOME_LEN
                        [Default is 2000000] Genome size in bp.
  --coverage COVERAGE   [Default is 10] coverage in X. Only considered if
                        --sample reads is selected.
  --min_cons_coverage MIN_CONS_COVERAGE
                        [Default is 1] Minimum number of nucleotides at
                        column.
  --dna_reference DNA_REFERENCE
                        [Default is None] Reference file that contains
                        nucleotide sequences (fasta, hdf5) with `.fa` extension. If not given it
                        will usethe RESTapi and retrieve sequences from
                        http://omabrowser.org directly. NOTE: internet
                        connection required!
  --sc_threshold SC_THRESHOLD
                        [Default is 0.25; Range 0-1] Parameter for selection
                        of sequences from mapping by completeness compared to
                        its reference sequence (number of ACGT basepairs vs
                        length of sequence). By default, all sequences are
                        selected.
  --ngmlr_parameters NGMLR_PARAMETERS
                        [Default is none] In case this parameters need to be
                        changed all 3 values have to be changed [x,subread-
                        length,R]. The standard is: ont,256,0.25.
                        Possibilities for these parameter can be found in the
                        original documentation of ngmlr.
  --check_mate_pairing  Check whether in case of paired end reads we have
                        consistent mate pairing. Setting this option will
                        automatically select the overlapping reads and do not
                        consider single reads.
  --debug               [Default is false] Changes to debug mode: * bam files
                        are saved!* reads are saved by mapping to OG
  --sequence_selection_mode SEQUENCE_SELECTION_MODE
                        [Default is sc] Possibilities are cov and cov_sc for
                        mapped sequence.
  -s SPECIES_NAME, --species_name SPECIES_NAME
                        [Default is name of read 1st file] Name of species for
                        mapped sequence.
  --tree                [Default is false] Compute tree, otherwise just output
                        concatenated alignment!
  --merge_all_mappings  [Default is off] In case multiple species were mapped
                        to the same reference this allows to merge this
                        mappings and build a tree with all included species!
  -r, --reference       [Default is off] Just generate the reference dataset
                        for mapping.
  --min_species MIN_SPECIES
                        Min number of species in selected orthologous groups.
                        If not selected it will be estimated such that around
                        1000 OGs are available.
  --single_mapping SINGLE_MAPPING
                        [Default is none] Single species file allowing to map
                        in a job array.
  --ref_folder REF_FOLDER
                        [Default is none] Folder containing reference files
                        with sequences sorted by species.
  --remove_species_mapping REMOVE_SPECIES_MAPPING
                        [Default is none] Remove species present in data set
                        after mapping step completed and only do analysis on
                        subset. Input is comma separated list without spaces,
                        e.g. XXX,YYY,AAA.
  --remove_species_ogs REMOVE_SPECIES_OGS
                        [Default is none] Remove species present in data set
                        after mapping step completed to build OGs. Input is
                        comma separated list without spaces, e.g. XXX,YYY,AAA.
  --keep_all_ogs        [Default is on] Keep all orthologs after addition of
                        mapped seq, which means also the OGs that have no
                        mapped sequence. Otherwise only OGs are used that have
                        the mapped sequence for alignment and tree inference.
  --ignore_species IGNORE_SPECIES
                        [Default is none] Ignores species part of the OMA
                        standalone pipeline. Input is comma separated list
                        without spaces, e.g. XXX,YYY,AAA.

read2tree (C) 2017-2022 David Dylus
Clone this wiki locally