Skip to content

Commit

Permalink
reduce distributed search config files
Browse files Browse the repository at this point in the history
  • Loading branch information
mschwoer committed Nov 13, 2024
1 parent 75e700b commit c800ee3
Show file tree
Hide file tree
Showing 3 changed files with 14 additions and 147 deletions.
15 changes: 5 additions & 10 deletions misc/distributed_search/dist_search_setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ Distributed AlphaDIA search on HPCL
=================================================

This guide deals with setting up a distributed search in AlphaDIA, with the following prerequisites:
- A Linux (Ubuntu) HPCL system with Slurm Workload Manager installed. All resource management is handled by Slurm, AlphaDIA does not select, manage or monitor worker nodes.
- A Linux (Ubuntu) HPCL system with Slurm Workload Manager installed. All resource management is handled by Slurm, AlphaDIA does not select, manage or monitor worker nodes.
- The distributed search requires absolute paths for each raw file, saved in the second column of a two-column .csv document. Simpler structures that e.g. process all files in a given directory are disfavored as large cohorts frequently consist of rawfiles spread across a number of subfolders.
- An Anaconda environment called "alphadia" with _mono_ and _alphadia_ installed (for installing _mono_, see https://github.com/MannLabs/alpharaw#installation)

Expand All @@ -18,21 +18,22 @@ Steps to set up a search

1. Set up an empty search directory on your HPCL partition. One directory corresponds to one study, i.e. one set of raw files, fasta/library and search configuration.
2. Copy all files from alphadia/misc/distributed_search into the search directory
3. If no .csv file with rawfile paths exists, it can be obtained by running **discover_project_files.py** from the search directory.
3. If no .csv file with rawfile paths exists, it can be obtained by running **discover_project_files.py** from the search directory.
4. Set first and second search configurations in **first_config.yaml** and **second_config.yaml**. For example, number of precursor candidates and inference strategy, as well as mass tolerances may differ between first and second search.
Leave all the predefined settings in the two .yaml files as they are.
5. Set the search parameters in **outer.sh**. While these can also be provided as command line arguments, it is convenient to set them in **outer.sh** itself. This file requires the following settings:
- input_directory: the search directory
- input_filename: the .csv file containing rawfile paths
- target_directory: the directory where intermediate and final outputs are written (mind that slow read/write speeds to this location may slow down your search)
- library_path (optional, will be reannotated if fasta_path is provided and predict_library is set to 1): absolute path to a .hdf spectral library
- fasta_path (optional if library_path is provided and predict_library is set to 0): absolute path to .fasta file
- first_search_config_filename: name of .yaml file for the first search
- first_search_config_filename: name of .yaml file for the first search
- second_search_config_filename: name of the .yaml file for the building the MBR library, second search and LFQ
6. Run **outer.sh** with the following search settings:
- --nnodes (int): specifies how many nodes can be occupied. Rawfile search will be distributed across these nodes. If there are 5 nodes and 50 raw files, the search will take place on 5 nodes in chunks of 10 rawfiles each.
- --ntasks_per_node (int): default to 1, some HPCL systems allow for multiple tasks to run on one node
- --cpus (int): default to 12, specifies how many CPUs shall be used per task
- --mem (str): default to '250G', specifies RAM requirements for each task.
- --mem (str): default to '250G', specifies RAM requirements for each task.
**HPCL systems may be set to restrict user resources to certain limits. Make sure the above parameters comply with your HPCL setup.**
- --predict_library (1/0): default to 1, whether to predict a spectral library from a given fasta
- --first_search (1/0): default to 1, whether to search all files with the initial spectral library
Expand All @@ -47,9 +48,3 @@ Running the search creates five subdirectories in the target folder:
- _mbr_library_: Contains one chunk, since the library is built from all first search results.
- _second_search_: Analogous to _first_search_, one subdirectory is created for each chunk of rawfiles that are searched with the mbr_library. Precursor and fragment datasets from these searches are saved into the _lfq_ folder.
- _lfq_: Analogous to _mbr_library_, contains one chunk which runs label free quantification (LFQ) on each output from the second search. After all search steps are completed, the final precursor and protein tables are saved here.






73 changes: 4 additions & 69 deletions misc/distributed_search/first_config.yaml
Original file line number Diff line number Diff line change
@@ -1,69 +1,4 @@
name: PeptideCentric.v1
general:
thread_count: 10
reuse_calibration: false
reuse_quant: false
use_gpu: false
astral_ms1: false
log_level: INFO
library_prediction:
predict: false
enzyme: trypsin
fixed_modifications: Carbamidomethyl@C
variable_modifications: Oxidation@M;Acetyl@Protein N-term
max_var_mod_num: 1
missed_cleavages: 1
precursor_len:
- 7
- 35
precursor_charge:
- 2
- 4
precursor_mz:
- 400
- 1200
fragment_mz:
- 200
- 2000
fragment_types: b;y
max_fragment_charge: 2
nce: 25
instrument: Fusion
search:
channel_filter: ''
exclude_shared_ions: true
compete_for_fragments: true
target_num_candidates: 2
target_ms1_tolerance: 5
target_ms2_tolerance: 10
target_mobility_tolerance: 0.04
target_rt_tolerance: 100
quant_window: 3
quant_all: false
fdr:
fdr: 0.01
group_level: proteins
inference_strategy: heuristic
competetive_scoring: true
channel_wise_fdr: false
keep_decoys: false
search_initial:
initial_num_candidates: 2
initial_ms1_tolerance: 10
initial_ms2_tolerance: 10
initial_mobility_tolerance: 0.08
initial_rt_tolerance: 100
multiplexing:
multiplexed_quant: false
target_channels: 4,8
decoy_channel: 12
reference_channel: 0
competetive_scoring: true
search_output:
min_k_fragments: 12
min_correlation: 0.9
num_samples_quadratic: 50
min_nonnan: 3
normalize_lfq: true
peptide_level_lfq: false
precursor_level_lfq: false
# config template for "first search"
# (Default settings are suitable for the first search.)

# adapt the rest of the config to your use case ...
73 changes: 5 additions & 68 deletions misc/distributed_search/second_config.yaml
Original file line number Diff line number Diff line change
@@ -1,69 +1,6 @@
name: PeptideCentric.v1
general:
thread_count: 10
reuse_calibration: false
reuse_quant: false
use_gpu: false
astral_ms1: false
log_level: INFO
library_prediction:
predict: false
enzyme: trypsin
fixed_modifications: Carbamidomethyl@C
variable_modifications: Oxidation@M;Acetyl@Protein N-term
max_var_mod_num: 1
missed_cleavages: 1
precursor_len:
- 7
- 35
precursor_charge:
- 2
- 4
precursor_mz:
- 400
- 1200
fragment_mz:
- 200
- 2000
fragment_types: b;y
max_fragment_charge: 2
nce: 25
instrument: Fusion
search:
channel_filter: ''
exclude_shared_ions: true
compete_for_fragments: true
target_num_candidates: 2
target_ms1_tolerance: 5
target_ms2_tolerance: 10
target_mobility_tolerance: 0.04
target_rt_tolerance: 100
quant_window: 3
quant_all: false
# config template for "second search"
fdr:
fdr: 0.01
group_level: proteins
inference_strategy: library
competetive_scoring: true
channel_wise_fdr: false
keep_decoys: false
search_initial:
initial_num_candidates: 5
initial_ms1_tolerance: 10
initial_ms2_tolerance: 10
initial_mobility_tolerance: 0.08
initial_rt_tolerance: 100
multiplexing:
multiplexed_quant: false
target_channels: 4,8
decoy_channel: 12
reference_channel: 0
competetive_scoring: true
search_output:
min_k_fragments: 12
min_correlation: 0.9
num_samples_quadratic: 50
min_nonnan: 3
normalize_lfq: true
peptide_level_lfq: false
precursor_level_lfq: false
inference_strategy: library # do not change for "second search"
search:
target_num_candidates: 5 # do not change for "second search"
# adapt the rest of the config to your use case ...

0 comments on commit c800ee3

Please sign in to comment.