Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring, arg list handling, add two parameters #11

Merged
merged 18 commits into from
Mar 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
d16b725
(1) Fix handling of ks_peaks.tsv; (2) shorten 19_pan_aug_leftover_mer…
Feb 21, 2024
5ae31a7
MANY changes in the course of moving alignment & modeling steps to -c…
Feb 23, 2024
b8ababa
Various fixes in alignment and modeling steps, after moving this code…
StevenCannon-USDA Feb 23, 2024
403dd1d
Handle argument list too long errors around augment_cluster_sets.awk
StevenCannon-USDA Feb 26, 2024
2d4500c
Add Wm82.gnm6 annotation to Glycine pangene config & make file
StevenCannon-USDA Feb 26, 2024
38840b2
Remove 04_dag at start of filter, to avoid problems of leftover files…
StevenCannon-USDA Feb 26, 2024
947861f
Minor cosmetic change in stats/ks_histplots.tsv
StevenCannon-USDA Feb 27, 2024
1b2337b
Add two params, min_align_count min_annots_in_align, to the reporting…
StevenCannon-USDA Feb 27, 2024
d592aa0
Fix listing of pandagma_conf_params reported in summarize step, and o…
StevenCannon-USDA Feb 28, 2024
ecc31ff
Add two alignment parameters to config files; and minor changes to pa…
StevenCannon-USDA Feb 28, 2024
17df08d
Changes in step xfr_aligns_trees to handle output from fsup
StevenCannon-USDA Feb 29, 2024
438b78f
Remove optional steps align_cds, align_protein, model_and_trim, calc_…
StevenCannon-USDA Feb 29, 2024
4437eb4
Bug fix & documentation tweaks in pandagma-fsup.sh
StevenCannon-USDA Feb 29, 2024
215ff4c
Minor change to handling of directory/file transfer in pandagma-fsup.sh
StevenCannon-USDA Feb 29, 2024
8797872
Minor change in stats reporting
StevenCannon-USDA Feb 29, 2024
fdf5414
Restructure ks_filter step so that ks_block_wgd_cutoff and max_pair_k…
StevenCannon-USDA Feb 29, 2024
0d10f5d
Handle transfer of 19_pan_aug_leftover_merged_prot in -common rather …
StevenCannon-USDA Feb 29, 2024
cef3b01
Tweak output of counts-per-accession header in stats.txt file
StevenCannon-USDA Feb 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 6 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -185,7 +185,8 @@ Subcommands for the **pangene** workflow, `pandagma pan`, in order they are usua
Subcommands for the **gene family** workflow, `pandagma fam`, in order they are usually run:

```
Run these first (if using ks_calc)
Run these first (if using the ks_peaks.tsv file; otherwise, run all main steps and
ks filtering will be done using parameters ks_block_wgd_cutoff and max_pair_ks)
all - All of the steps below, except for ks_filter and clean
(Or equivalently: omit the -s flag; \"all\" is default).
ingest - Prepare the assembly and annotation files for analysis.
Expand All @@ -195,7 +196,7 @@ Subcommands for the **gene family** workflow, `pandagma fam`, in order they are
ks_calc - Calculation of Ks values on gene pairs from DAGchainer output.

Evaluate the stats/ks_histplots.tsv and stats/ks_peaks_auto.tsv files and
put ks_peaks.tsv into the work directory, then run the following commands:
put ks_peaks.tsv into the \${WORK_DIR}/stats directory, then run the following commands:
ks_filter - Filtering based on provided ks_peaks.tsv file (assumes prior ks_calc step)
mcl - Derive clusters, with Markov clustering.
consense - Calculate a consensus sequences from each pan-gene set,
Expand Down Expand Up @@ -369,11 +370,11 @@ ks_block_wgd_cutoff - Fallback, if a ks_peaks.tsv file is not provided. [1.75]
remaining steps (see **8** below).

An intermediate output file, `stats/ks_peaks_auto.tsv`, is written to the work directory
This should be examined for biological plausibility (look at Ks peak values in column 3),
along with the other Ks results (histograms) in the work_pandagma/stats subdirectory.
This should be examined for biological plausibility, along with the other
Ks results (histograms) in the work_pandagma/stats subdirectory.
The `ks_peaks_auto.tsv` file can be examined and used to create a file named `ks_peaks.tsv`
with changes relative to `ks_peaks_auto.tsv` if necessary to reflect known or suspected WGD histories.
If stats/ks_histplots.tsv is not provided, then Ks filtering will be done using values provided
If stats/ks_peaks.tsv is not provided, then Ks filtering will be done using values provided
in the config file for ks_block_wgd_cutoff and max_pair_ks.

4. Run steps `ks_filter` through `summarize`.
Expand Down
2 changes: 1 addition & 1 deletion batch_fam_example_singularity.sh
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ singularity exec $IMAGE pandagma fam -c $CONFIG
#singularity exec $IMAGE pandagma fam -c $CONFIG -s consense
#singularity exec $IMAGE pandagma fam -c $CONFIG -s cluster_rest
#singularity exec $IMAGE pandagma fam -c $CONFIG -s add_extra
#singularity exec $IMAGE pandagma fam -c $CONFIG -s align
#singularity exec $IMAGE pandagma fam -c $CONFIG -s align_protein
#singularity exec $IMAGE pandagma fam -c $CONFIG -s model_and_trim
#singularity exec $IMAGE pandagma fam -c $CONFIG -s calc_trees
#singularity exec $IMAGE pandagma fam -c $CONFIG -s summarize
Expand Down
75 changes: 75 additions & 0 deletions batch_fam_prod.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
#!/bin/bash
#SBATCH -A m4440
#SBATCH -q regular
#SBATCH -N 1
#SBATCH -n 30 # number of cores/tasks in this job
#SBATCH -t 23:00:00
#SBATCH -C cpu
#SBATCH -J pand-fam2
#SBATCH -o %x_%j.out
#SBATCH -e %x_%j.err

set -o errexit
set -o nounset
set -o xtrace

date # print timestamp

# If using conda environment for dependencies:
module load conda
conda activate pandagma

PDGPATH=$PWD
CONFIG=$PWD/config/family3_22_3.conf

echo "Config: $CONFIG"

export PATH=$PATH:$PDGPATH/bin
echo "PATH: $PATH"

##########
# Test PATH
which pandagma
which calc_ks_from_dag.pl

##########
## Fetch relevant data files; e.g.
#mkdir -p data
#make -C data -f $PWD/get_data/family3_22_3.mk

##########
## Filter transposable elements
#pandagma TEfilter -c $CONFIG

##########
## Run all main steps, assuming input data files exist in ./data
## Work directory will be ./work_pandagma
## Output will go to ./out_pandagma
#pandagma fam -c $CONFIG -d data_TEfilter

##########
## Run individual steps
#pandagma fam -c $CONFIG -s ingest -d data_TEfilter
#pandagma fam -c $CONFIG -s mmseqs
#pandagma fam -c $CONFIG -s filter
#pandagma fam -c $CONFIG -s dagchainer
#pandagma fam -c $CONFIG -s ks_calc
#pandagma fam -c $CONFIG -s ks_filter
#pandagma fam -c $CONFIG -s mcl
#pandagma fam -c $CONFIG -s consense
#pandagma fam -c $CONFIG -s cluster_rest
#pandagma fam -c $CONFIG -s add_extra
#pandagma fam -c $CONFIG -s tabularize
#pandagma fam -c $CONFIG -s align_protein
#pandagma fam -c $CONFIG -s model_and_trim
#pandagma fam -c $CONFIG -s calc_trees
pandagma fam -c $CONFIG -s xfr_aligns_trees
pandagma fam -c $CONFIG -s summarize

##########
## Optional work-directory cleanup steps
#pandagma fam -c $CONFIG -s clean
#rm -rf ./work_pandagma

date # print timestamp

3 changes: 2 additions & 1 deletion batch_pan_example_conda.sh
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,8 @@ which calc_ks_from_dag.pl

##########
# Optional alignment and tree-construction steps
#pandagma pan -c $CONFIG -s align
#pandagma pan -c $CONFIG -s align_cds
#pandagma pan -c $CONFIG -s align_protein
#pandagma pan -c $CONFIG -s model_and_trim
#pandagma pan -c $CONFIG -s calc_trees
pandagma pan -c $CONFIG -s xfr_aligns_trees
Expand Down
7 changes: 5 additions & 2 deletions bin/fetch-datastore.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,13 @@ readonly DATAFILE=${1}

# adjust URL for collections that are located in the annex
case ${DATAFILE} in
acacr.Acra3RX.gnm1.ann1.6C0V.*|\
arahy.Tifrunner.gnm1.ann2.TN8K.*|\
arath.Col0.gnm9.ann11.KH24.*|\
bauva.BV-YZ2020.gnm2.ann1.RJ1G.*|\
chafa.ISC494698.gnm1.ann1.G7XW.*|\
dalod.SKLTGB.gnm1.ann1.R67B.*|\
phach.longxuteng.gnm1.ann1.KGX9.*|\
prupe.Lovell.gnm2.ann1.S2ZZ.*|\
quisa.S10.gnm1.ann1.RQ4J.*|\
sento.Myeongyun.gnm1.ann1.5WXB.*|\
Expand All @@ -30,6 +32,7 @@ collection_type=annotations

case ${genspa} in
[A-Z]*) genus=${genspa} species=GENUS collection_type=pangenes collection=${1%.*.*.*} ;;
acacr) genus=Acacia species=crassicarpa ;;
aesev) genus=Aeschynomene species=evenia ;;
aradu) genus=Arachis species=duranensis ;;
arahy) genus=Arachis species=hypogaea ;;
Expand All @@ -52,13 +55,15 @@ case ${genspa} in
glyso) genus=Glycine species=soja ;;
glyst) genus=Glycine species=stenophita ;;
glysy) genus=Glycine species=syndetika ;;
labpu) genus=Lablab species=purpureus ;;
legume) genus=LEGUMES species=Fabaceae ;;
lencu) genus=Lens species=culinaris ;;
lotja) genus=Lotus species=japonicus ;;
lupal) genus=Lupinus species=albus ;;
medsa) genus=Medicago species=sativa ;;
medtr) genus=Medicago species=truncatula ;;
phaac) genus=Phaseolus species=acutifolius ;;
phach) genus=Phanera species=championii ;;
phalu) genus=Phaseolus species=lunatus ;;
phavu) genus=Phaseolus species=vulgaris ;;
pissa) genus=Pisum species=sativum ;;
Expand Down Expand Up @@ -88,6 +93,4 @@ if [[ "$collection" == *"XinJiangDaYe"* ]]; then
collection="XinJiangDaYe.gnm1.ann1.RKB9"
fi

#echo "${DATASTORE}/${genus}/${species}/${collection_type}/${collection}/${DATAFILE}"

curl --no-progress-meter --fail "${DATASTORE}/${genus}/${species}/${collection_type}/${collection}/${DATAFILE}"
Loading