Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some questions about single peptidoform prediction #231

Open
ypriverol opened this issue Nov 2, 2024 · 3 comments
Open

Some questions about single peptidoform prediction #231

ypriverol opened this issue Nov 2, 2024 · 3 comments

Comments

@ypriverol
Copy link

@RalfG :

We have been trying to use MS2PIP in combination with USI. The use case is that for a given USI get the predicted b,y ions, retention time, intensities, etc, would be good to have also correlations.

I planned to use the ms2pip.predict_single, but this function only predicts b,y ions for a given peptidoform. However, would be possible to extend that function for:

  • peptidoform,
  • retention time
  • experimental peak information.

In the current implementation, I have to take the given peaks and write them down into a file, to use correlate I guess?

@RalfG
Copy link
Member

RalfG commented Nov 3, 2024

Hi Yasset,

Good point! We hadn't considered this usage mode yet, as it is not possible to do this through the CLI. (Pretty difficult to pass an observed spectrum through a CLI argument).

I added a new correlate-single usage mode in #232 for the Python API only. This should be part of the next release. In the meantime, you can use the code in this Colab notebook. It takes a USI, MS²PIP model, and mass tolerance in Da, then gets the spectrum through the Pyteomics ProXI client, annotates it with MS²PIP, gets the MS²PIP predictions, calculate the Pearson correlation, and as a bonus, plots both spectra with spectrum_utils.

I'm not sure how to add retention time to this comparison, as this would require knowledge of the complete MS run with at least a some other PSMs for RT calibration.

Let me know if this fits your needs, or if you have questions.

Best,
Ralf

@ypriverol
Copy link
Author

Thanks, @RalfG that will do the work for me. Im making an API endpoint that does exactly that. Given a USI and including peptidoform give you back the predicted intensities and annotations.

@RalfG Im quickly checking the Notebook, Im just wondering, why you need these two lines of code:

 observed_spectrum.tic_norm()
 observed_spectrum.log2_transform()

@RalfG
Copy link
Member

RalfG commented Nov 4, 2024

Perfect!

The tic_norm method normalizes the intensities to the total ion current (TIC), similar to a base-peak transformation. The log2_transform method takes the base-2 logarithm of the intensities with an addition of 0.001 to avoid divisions by zero (sometimes called epsilon or fuzz factor).

The former is done simply to get a consistent input, the latter to make the distribution of intensities more linear and therefore easier to accurately correlate. Differently put, it de-emphasizes the higher peaks and emphasizes the lower peaks. TIC-norm and log2-transform were found to give the best similarity measure with Pearson correlation in the following work: https://doi.org/10.1002/pmic.201000605. Nevertheless, I expect very similar results with base-peak normalization and square root transformation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants