Analysis of profile data MSPathFinder #31

WinkelsK · 2022-02-28T09:57:58Z

Hi all,
I have acquired profile data (MS1 and MS2) on an Thermo instrument.
I have now tested the following two MSPathFinder piplines:

Use the raw file as input for pbf generation via PbfGen and Promex deconvolution.
Convert raw file with msconvert peak picking to mzML and subsequently use this mzml file as input for PbfGen and Promex.

Both workflows run successfully and give a similar number of identifications. BUT the overlap of the identified proteoforms is only 45% (comparing sequences).
I am very unsure which results I can trust.
Looking forward to your feedback!
Cheers,
Konrad

dtabb73 · 2022-02-28T10:36:55Z

Hi, Konrad. I saw your interesting result from MSPathFinderT on PBFs generated from RAW and from mzML routes. I was surprised by the degree of difference you saw, though, since presumably the only difference is whether the software had access to peak profiles (RAW) or peak centroids (mzML). I don't think you specified whether or not you performed peaklisting in msConvert, though. I should think that the scan numbers will be the same whether you start from RAW or from mzML, so it should be possible to ask what each search concluded for individual scans. I have written some tools for reading search results from multiple search engines (TopPIC, ProSight PD, pTop, and MSPT) here: https://github.com/dtabb73/ProForma-Exporters. I'm accustomed to differences between search engines, of course. Thanks, Dave From: WinkelsK ***@***.***> Sent: Monday, February 28, 2022 10:58 AM To: PNNL-Comp-Mass-Spec/Informed-Proteomics ***@***.***> Cc: Subscribed ***@***.***> Subject: [PNNL-Comp-Mass-Spec/Informed-Proteomics] Analysis of profile data MSPathFinder (Issue #31) CAUTION: This email originated from outside the Stellenbosch University network. Do not click links or open attachments unless you recognize the sender and know the content is safe. Hi all, I have acquired profile data (MS1 and MS2) on an Thermo instrument. I have now tested the following two MSPathFinder piplines: 1. Use the raw file as input for pbf generation via PbfGen and Promex deconvolution. 2. Convert raw file with msconvert peak picking to mzML and subsequently use this mzml file as input for PbfGen and Promex. Both workflows run successfully and give a similar number of identifications. BUT the overlap of the identified proteoforms is only 45% (comparing sequences). I am very unsure which results I can trust. Looking forward to your feedback! Cheers, Konrad - Reply to this email directly, view it on GitHub<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FPNNL-Comp-Mass-Spec%2FInformed-Proteomics%2Fissues%2F31&data=04%7C01%7C%7C47157397e9e54dc4055408d9faa0d531%7Ca6fa3b030a3c42588433a120dffcd348%7C0%7C0%7C637816390973202195%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=dbvpGAbw3BwQC2E77o0HmxVS%2FCl2JTfK0alyBu0NeEo%3D&reserved=0>, or unsubscribe<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAGF2O7T2E4AVN6EUGKPWAMTU5NBLHANCNFSM5PQTN7CA&data=04%7C01%7C%7C47157397e9e54dc4055408d9faa0d531%7Ca6fa3b030a3c42588433a120dffcd348%7C0%7C0%7C637816390973202195%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=2hdwIR6AqE25JJdFQJh5dbtbEzCTiTp5fENa%2F9YCD%2BM%3D&reserved=0>. Triage notifications on the go with GitHub Mobile for iOS<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapps.apple.com%2Fapp%2Fapple-store%2Fid1477376905%3Fct%3Dnotification-email%26mt%3D8%26pt%3D524675&data=04%7C01%7C%7C47157397e9e54dc4055408d9faa0d531%7Ca6fa3b030a3c42588433a120dffcd348%7C0%7C0%7C637816390973202195%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=r6KnGCY0ZAsP1Xdu%2Fguy%2F%2BGpQVD7kDPY7NFZimrBE8I%3D&reserved=0> or Android<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fplay.google.com%2Fstore%2Fapps%2Fdetails%3Fid%3Dcom.github.android%26referrer%3Dutm_campaign%253Dnotification-email%2526utm_medium%253Demail%2526utm_source%253Dgithub&data=04%7C01%7C%7C47157397e9e54dc4055408d9faa0d531%7Ca6fa3b030a3c42588433a120dffcd348%7C0%7C0%7C637816390973202195%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=iizKIiDwqhJKbUBwvHEYBjGy5oQVHMOoc33CfSvCGmo%3D&reserved=0>. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

WinkelsK · 2022-03-02T14:21:08Z

Short feedback: What I have found when working with profile thermo raw files:
Results are the same, when

Using raw files as input for PbfGen and Promex
Using mzML files (generated via MSConvert, peak picking, centroid, Vendor, see picture)

Results are different (only 50% overlap on Sequences identified) when I use an mzML file in profile mode (generated by MSConvert without any filter) as input for PbfGen and Promex.

FarmGeek4Life · 2022-03-02T16:45:34Z

Differences are expected - ProMex and MSPathfinder need centroided data, and the raw file reader built in uses the centroiding provided by the Thermo library/raw file, same as the MSConvert vendor centroiding. When reading an mzML created with profile data, it uses either the CWT centroiding (if it can find ProteoWizard DLLs) or a very simplistic local maxima algorithm, both of which will not match the vendor centroiding results. As for the only 50% overlap, there will be differences in every single peak mass with those different centroiding algorithms; you could look at the differences by opening the files with LCMSSpectator.

…

________________________________ From: WinkelsK ***@***.***> Sent: Wednesday, March 2, 2022 6:21:27 AM To: PNNL-Comp-Mass-Spec/Informed-Proteomics ***@***.***> Cc: Subscribed ***@***.***> Subject: Re: [PNNL-Comp-Mass-Spec/Informed-Proteomics] Analysis of profile data MSPathFinder (Issue #31) Check twice before you click! This email originated from outside PNNL. Short feedback: What I have found when working with profile thermo raw files: Results are the same, when 1. Using raw files as input for PbfGen and Promex 2. Using mzML files (generated via MSConvert, peak picking, centroid, Vendor, see picture) Results are different (only 50% overlap on Sequences identified) when I use an mzML file in profile mode (generated by MSConvert without any filter) as input for PbfGen and Promex. [Capture]<https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fuser-images.githubusercontent.com%2F92794723%2F156379507-fc08403d-1369-4eb7-81d1-f1525212cbf1.PNG&data=04%7C01%7Cbryson.gibbons%40pnnl.gov%7C6c5b0c7f2e90466cae6f08d9fc57f4f8%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C637818276980714889%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=F4UbMQyDDK4rpTDtAzcsA6Ot%2B6eVEk5GJaI5hxxB8qI%3D&reserved=0> — Reply to this email directly, view it on GitHub<https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FPNNL-Comp-Mass-Spec%2FInformed-Proteomics%2Fissues%2F31%23issuecomment-1056982781&data=04%7C01%7Cbryson.gibbons%40pnnl.gov%7C6c5b0c7f2e90466cae6f08d9fc57f4f8%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C637818276980714889%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=oqxJFzjhe0RmGKWky4E56hbH3demmhF3aUPxokeitk8%3D&reserved=0>, or unsubscribe<https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABPPX5L4XKHXS5R64SLSYCTU552OPANCNFSM5PQTN7CA&data=04%7C01%7Cbryson.gibbons%40pnnl.gov%7C6c5b0c7f2e90466cae6f08d9fc57f4f8%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C637818276980714889%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=pvMqqjKKK%2Fsa3%2FdTW6KWc1xpxzy%2BbJbuovM7FsEKvvY%3D&reserved=0>. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

WinkelsK · 2022-03-03T07:59:10Z

Thanks Bryson!
I didn't put that all together initially, but am now happily using MSPathFinder! Thanks :) Konrad

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Analysis of profile data MSPathFinder #31

Analysis of profile data MSPathFinder #31

WinkelsK commented Feb 28, 2022

dtabb73 commented Feb 28, 2022 via email

WinkelsK commented Mar 2, 2022

FarmGeek4Life commented Mar 2, 2022 via email

WinkelsK commented Mar 3, 2022

Analysis of profile data MSPathFinder #31

Analysis of profile data MSPathFinder #31

Comments

WinkelsK commented Feb 28, 2022

dtabb73 commented Feb 28, 2022 via email

WinkelsK commented Mar 2, 2022

FarmGeek4Life commented Mar 2, 2022 via email

WinkelsK commented Mar 3, 2022