Skip to content

Commit

Permalink
Merge pull request #91 from daichengxin/main
Browse files Browse the repository at this point in the history
Improved README.md
  • Loading branch information
ypriverol authored Oct 30, 2023
2 parents ac8f922 + 765346e commit 5f3e3e5
Showing 1 changed file with 19 additions and 2 deletions.
21 changes: 19 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,32 +1,38 @@
# pmultiqc

[![Python application](https://github.com/bigbio/pmultiqc/actions/workflows/python-app.yml/badge.svg?branch=main)](https://github.com/bigbio/pmultiqc/actions/workflows/python-app.yml)
[![Upload Python Package](https://github.com/bigbio/pmultiqc/actions/workflows/python-publish.yml/badge.svg)](https://github.com/bigbio/pmultiqc/actions/workflows/python-publish.yml)

A library for proteomics QC report based on MultiQC framework. The library generates a QC report for the [quantms pipeline](https://github.com/nf-core/quantms). The library read the input of the quantms pipeline by specified analysis dir, with the following structure:

- analysis_dir : Final results of the pipeline
- experimental_design.tsv : experimental design file in two-table format
- out.mzTab : mzTab with results of the identification
- *.mzTab : mzTab with results of the identification
- *msstats\*csv : MSstats/MSstatsTMT input file
- *.mzML : mzML spectra files
- *ms_info.tsv : MS quality control information extracted from raw files.
- *.idXML : Identification results from search + percolator
- *.yml : summary software information and parameters of quantms pipeline (optional)
- diann_report.tsv : DIA-NN main report file. Only for DIA analysis.

## Usage

```multiqc {analysis_dir} -o {output dir}```

example: ```multiqc resources/LFQ -o ./```

### parameters

- --raw: Keep filenames in experimental design output as raw when exp_design file is provided
- --condition: Create conditions from provided (e.g., factor) columns when exp_design file is provided
- --remove_decoy: Whether to remove the decoy peptides when counting
- --decoy_affix: Pre- or suffix of decoy proteins in their accession
- --contaminant_affix: The contaminant prefix or suffix used or to be used
- --affix_type: Location of the decoy marker string in the fasta accession. Before (prefix) or after (suffix)
- --disable_plugin: disable pmultiqc plugin

- --quantification_method: The quantification method for LFQ experiment (default: `feature_intensity`)
- --disable_table: Disable protein/peptide table plots for large dataset
- --ignored_idxml: ignored idxml files for faster running

An example report can be found in [multiqc_report.html](http://bigbio.xyz/pmultiqc/shared-peptides-star-align-stricter-pep-protein-FDR/multiqc_report.html)

Expand All @@ -49,6 +55,8 @@ Most of the metrics are compute based on the `out.mzTab` and the `*.idXML` which

- **Summary Table**: shows the number of spectra, % of identified spectra, total peptide count, total identified proteins (including protein groups - if two proteins are identified by the same peptide the two proteins are count) http://bigbio.xyz/pmultiqc/shared-peptides-star-align-stricter-pep-protein-FDR/multiqc_report.html#proteomicslfq_summary_table

- **MS1 Information**: shows the quality control metrics on MS1 level, including total ion chromatograms (TIC), base peaks count (BPC), number of MS1 peaks, and general stats.

- **Pipeline Results Statistics**: shows quantms pipeline final results, total peptide identified, total identified proteins et al (The data comes from mzTab and the experimental design file).

- **Number of peptides per Protein**: Includes an histogram with the number of peptides per proteins http://bigbio.xyz/pmultiqc/shared-peptides-star-align-stricter-pep-protein-FDR/multiqc_report.html#num_of_pep_per_prot
Expand All @@ -68,6 +76,15 @@ A table called [Spectra Tracking](http://bigbio.xyz/pmultiqc/shared-peptides-sta
- Final result of Spectra: Final number of PSMs reported in the mzTab
- Final result of Peptides: Final number of Peptides identified in the mzTab

### Summary of Search Engine Scores

This section contains search scores and PEPs counts for different search engines in different files, and they also contain a summary of the consensus PSMs if two or more search engines are used.

- SpecEvalue: Spectral E-values, the search score of MSGF. The value used for plotting is -lg(SpecEvalue).
- xcorr: cross-correlation scores, the search score of Comet. The value used for plotting is xcorr.
- hyperscore : Hyperscore, the search score of Sage. The value used for plotting is hyperscore.
- Consensus support: is a measure of agreement between search engines. Every peptide sequence in the analysis has been identified by at least one search run. The consensus support defines which fraction (between 0 and 1) of the remaining search runs "supported" a peptide identification that was kept.

### Precursor Charges Distribution

The [Precursor Charges Distribution](http://bigbio.xyz/pmultiqc/shared-peptides-star-align-stricter-pep-protein-FDR/multiqc_report.html#Distribution_of_precursor_charges) aims to show the distribution of the precursor ion charges for a given whole experiment, but also for the identified spectra and unidentified spectra. This information can be used to identify potential ionization problems including many 1+ charges from an ESI ionization source or an unexpected distribution of charges. MALDI experiments are expected to contain almost exclusively 1+ charged ions. An unexpected charge distribution may furthermore be caused by specific search engine parameter settings such as limiting the search to specific ion charges.
Expand Down

0 comments on commit 5f3e3e5

Please sign in to comment.