Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors in MultiQC ("quantms" module) #186

Open
hendrikweisser opened this issue Mar 28, 2022 · 2 comments
Open

Errors in MultiQC ("quantms" module) #186

hendrikweisser opened this issue Mar 28, 2022 · 2 comments

Comments

@hendrikweisser
Copy link

I've encountered two errors in the "quantms" module for MultiQC during the "pmultiqc" step:

  1. If the experimental design file ("--input" parameter) uses the one-table OpenMS format (see https://abibuilder.informatik.uni-tuebingen.de/archive/openms/Documentation/release/latest/html/classOpenMS_1_1ExperimentalDesign.html#details), I get the error below. The reason seems to be that the "Sample" column in the experimental design table is expected by "quantms", but is not used in the one-table format. (If I use the two-tables format, the error goes away.)
  Parsing out csv file...
  ╭──────────────── Oops! The 'quantms' MultiQC module broke... ─────────────────╮
  │ Please copy this log and report it at                                        │
  │ https://github.com/ewels/MultiQC/issues                                      │
  │ Please attach a file that triggers the error. The last file found was:       │
  │ ./proteomicslfq/out.mzTab                                                    │
  │                                                                              │
  │ Traceback (most recent call last):                                           │
  │   File "/opt/conda/envs/nf-core-proteomicslfq-1.1.0dev/lib/python3.9/site-pa │
  │     return self._engine.get_loc(casted_key)                                  │
  │   File "pandas/_libs/index.pyx", line 136, in pandas._libs.index.IndexEngine │
  │   File "pandas/_libs/index.pyx", line 163, in pandas._libs.index.IndexEngine │
  │   File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs │
  │   File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs │
  │ KeyError: 'Sample'                                                           │
  │                                                                              │
  │ The above exception was the direct cause of the following exception:         │
  │                                                                              │
  │ Traceback (most recent call last):                                           │
  │   File "/opt/conda/envs/nf-core-proteomicslfq-1.1.0dev/lib/python3.9/site-pa │
  │     output = mod()                                                           │
  │   File "/opt/conda/envs/nf-core-proteomicslfq-1.1.0dev/lib/python3.9/site-pa │
  │     self.parse_out_csv()                                                     │
  │   File "/opt/conda/envs/nf-core-proteomicslfq-1.1.0dev/lib/python3.9/site-pa │
  │     Sample = list(exp_data[exp_data['Spectra_Filepath'] == i]['Sample'])[0]  │
  │   File "/opt/conda/envs/nf-core-proteomicslfq-1.1.0dev/lib/python3.9/site-pa │
  │     indexer = self.columns.get_loc(key)                                      │
  │   File "/opt/conda/envs/nf-core-proteomicslfq-1.1.0dev/lib/python3.9/site-pa │
  │     raise KeyError(key) from err                                             │
  │ KeyError: 'Sample'                                                           │
  │                                                                              │
  ╰──────────────────────────────────────────────────────────────────────────────╯
  1. If both Comet and MS-GF+ are used as search engines ("--search_engines comet,msgf"), with results combined using ConsensusID, I get the error below. The reason seems to be that "quantms" checks for the presence of "msgf" or "comet" in the names of input idXML files, but in my case the files are named "..._consensus_fdr_filter.idXML". As a consequence the mzML_name variable is not initialised in the Python code (see https://github.com/bigbio/pmultiqc/blob/main/pmultiqc/modules/quantms/quantms.py#L1154-L1175).
  Parsing 20220223d_JR_METTL1_SILAC_SST_01_consensus_fdr_filter.idXML...
  ╭──────────────── Oops! The 'quantms' MultiQC module broke... ─────────────────╮
  │ Please copy this log and report it at                                        │
  │ https://github.com/ewels/MultiQC/issues                                      │
  │ Please attach a file that triggers the error. The last file found was:       │
  │ ./proteomicslfq/out.mzTab                                                    │
  │                                                                              │
  │ Traceback (most recent call last):                                           │
  │   File "/opt/conda/envs/nf-core-proteomicslfq-1.1.0dev/lib/python3.9/site-pa │
  │     output = mod()                                                           │
  │   File "/opt/conda/envs/nf-core-proteomicslfq-1.1.0dev/lib/python3.9/site-pa │
  │     self.parse_mzml_idx()                                                    │
  │   File "/opt/conda/envs/nf-core-proteomicslfq-1.1.0dev/lib/python3.9/site-pa │
  │     mzml_table[mzML_name]['Final result of spectra'] = self.mL_spec_ident_fi │
  │ UnboundLocalError: local variable 'mzML_name' referenced before assignment   │
  │                                                                              │
  ╰──────────────────────────────────────────────────────────────────────────────╯
@hendrikweisser
Copy link
Author

One more problem related to the two-tables experimental design file:
If I edit the .tsv file in LibreOffice Calc and save it, tabs are added up to the width of the first table in all lines that don't have enough. (Not sure if Excel would do the same.) That includes the empty line between the tables, which afterwards contains four tab characters. This breaks "quantms" when it's looking for exactly the empty line (https://github.com/bigbio/pmultiqc/blob/main/pmultiqc/modules/quantms/quantms.py#L202):

  ╭──────────────── Oops! The 'quantms' MultiQC module broke... ─────────────────╮
  │ Please copy this log and report it at                                        │
  │ https://github.com/ewels/MultiQC/issues                                      │
  │ Please attach a file that triggers the error. The last file found was:       │
  │ ./proteomicslfq/out.mzTab                                                    │
  │                                                                              │
  │ Traceback (most recent call last):                                           │
  │   File "/opt/conda/envs/nf-core-proteomicslfq-1.1.0dev/lib/python3.9/site-pa │
  │     output = mod()                                                           │
  │   File "/opt/conda/envs/nf-core-proteomicslfq-1.1.0dev/lib/python3.9/site-pa │
  │     self.draw_exp_design()                                                   │
  │   File "/opt/conda/envs/nf-core-proteomicslfq-1.1.0dev/lib/python3.9/site-pa │
  │     empty_row = data.index('\n')                                             │
  │ ValueError: '\n' is not in list                                              │
  │                                                                              │
  ╰──────────────────────────────────────────────────────────────────────────────╯

I know technically the input file isn't in the correct format, but presumably the code could be made more robust by stripping whitespace when reading the file.

@fabianegli
Copy link

I see some other problems in that experimental design parsing. Mostly related to sanity checks. What if someone mistakenly has an empty line in the beginning of the file? That will result in the whole experimental design not being read in. See https://github.com/bigbio/pmultiqc/blob/13208f07545a0bd67b329ad1f9ff3e3f728e7996/pmultiqc/modules/quantms/quantms.py#L201-L203

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants