Some questions about single peptidoform prediction #231

ypriverol · 2024-11-02T08:50:40Z

We have been trying to use MS2PIP in combination with USI. The use case is that for a given USI get the predicted b,y ions, retention time, intensities, etc, would be good to have also correlations.

I planned to use the ms2pip.predict_single, but this function only predicts b,y ions for a given peptidoform. However, would be possible to extend that function for:

peptidoform,
retention time
experimental peak information.

In the current implementation, I have to take the given peaks and write them down into a file, to use correlate I guess?

The text was updated successfully, but these errors were encountered:

RalfG · 2024-11-03T21:20:00Z

Hi Yasset,

Good point! We hadn't considered this usage mode yet, as it is not possible to do this through the CLI. (Pretty difficult to pass an observed spectrum through a CLI argument).

I added a new correlate-single usage mode in #232 for the Python API only. This should be part of the next release. In the meantime, you can use the code in this Colab notebook. It takes a USI, MS²PIP model, and mass tolerance in Da, then gets the spectrum through the Pyteomics ProXI client, annotates it with MS²PIP, gets the MS²PIP predictions, calculate the Pearson correlation, and as a bonus, plots both spectra with spectrum_utils.

I'm not sure how to add retention time to this comparison, as this would require knowledge of the complete MS run with at least a some other PSMs for RT calibration.

Let me know if this fits your needs, or if you have questions.

Best,
Ralf

ypriverol · 2024-11-04T07:25:09Z

Thanks, @RalfG that will do the work for me. Im making an API endpoint that does exactly that. Given a USI and including peptidoform give you back the predicted intensities and annotations.

@RalfG Im quickly checking the Notebook, Im just wondering, why you need these two lines of code:

 observed_spectrum.tic_norm()
 observed_spectrum.log2_transform()

RalfG · 2024-11-04T20:19:39Z

Perfect!

The tic_norm method normalizes the intensities to the total ion current (TIC), similar to a base-peak transformation. The log2_transform method takes the base-2 logarithm of the intensities with an addition of 0.001 to avoid divisions by zero (sometimes called epsilon or fuzz factor).

The former is done simply to get a consistent input, the latter to make the distribution of intensities more linear and therefore easier to accurately correlate. Differently put, it de-emphasizes the higher peaks and emphasizes the lower peaks. TIC-norm and log2-transform were found to give the best similarity measure with Pearson correlation in the following work: https://doi.org/10.1002/pmic.201000605. Nevertheless, I expect very similar results with base-peak normalization and square root transformation.

RalfG added question feature labels Nov 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some questions about single peptidoform prediction #231

Some questions about single peptidoform prediction #231

ypriverol commented Nov 2, 2024

RalfG commented Nov 3, 2024

ypriverol commented Nov 4, 2024

RalfG commented Nov 4, 2024

Some questions about single peptidoform prediction #231

Some questions about single peptidoform prediction #231

Comments

ypriverol commented Nov 2, 2024

RalfG commented Nov 3, 2024

ypriverol commented Nov 4, 2024

RalfG commented Nov 4, 2024