-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Analysis of profile data MSPathFinder #31
Comments
Hi, Konrad.
I saw your interesting result from MSPathFinderT on PBFs generated from RAW and from mzML routes. I was surprised by the degree of difference you saw, though, since presumably the only difference is whether the software had access to peak profiles (RAW) or peak centroids (mzML). I don't think you specified whether or not you performed peaklisting in msConvert, though.
I should think that the scan numbers will be the same whether you start from RAW or from mzML, so it should be possible to ask what each search concluded for individual scans.
I have written some tools for reading search results from multiple search engines (TopPIC, ProSight PD, pTop, and MSPT) here: https://github.com/dtabb73/ProForma-Exporters. I'm accustomed to differences between search engines, of course.
Thanks,
Dave
From: WinkelsK ***@***.***>
Sent: Monday, February 28, 2022 10:58 AM
To: PNNL-Comp-Mass-Spec/Informed-Proteomics ***@***.***>
Cc: Subscribed ***@***.***>
Subject: [PNNL-Comp-Mass-Spec/Informed-Proteomics] Analysis of profile data MSPathFinder (Issue #31)
CAUTION: This email originated from outside the Stellenbosch University network. Do not click links or open attachments unless you recognize the sender and know the content is safe.
Hi all,
I have acquired profile data (MS1 and MS2) on an Thermo instrument.
I have now tested the following two MSPathFinder piplines:
1. Use the raw file as input for pbf generation via PbfGen and Promex deconvolution.
2. Convert raw file with msconvert peak picking to mzML and subsequently use this mzml file as input for PbfGen and Promex.
Both workflows run successfully and give a similar number of identifications. BUT the overlap of the identified proteoforms is only 45% (comparing sequences).
I am very unsure which results I can trust.
Looking forward to your feedback!
Cheers,
Konrad
-
Reply to this email directly, view it on GitHub<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FPNNL-Comp-Mass-Spec%2FInformed-Proteomics%2Fissues%2F31&data=04%7C01%7C%7C47157397e9e54dc4055408d9faa0d531%7Ca6fa3b030a3c42588433a120dffcd348%7C0%7C0%7C637816390973202195%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=dbvpGAbw3BwQC2E77o0HmxVS%2FCl2JTfK0alyBu0NeEo%3D&reserved=0>, or unsubscribe<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAGF2O7T2E4AVN6EUGKPWAMTU5NBLHANCNFSM5PQTN7CA&data=04%7C01%7C%7C47157397e9e54dc4055408d9faa0d531%7Ca6fa3b030a3c42588433a120dffcd348%7C0%7C0%7C637816390973202195%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=2hdwIR6AqE25JJdFQJh5dbtbEzCTiTp5fENa%2F9YCD%2BM%3D&reserved=0>.
Triage notifications on the go with GitHub Mobile for iOS<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapps.apple.com%2Fapp%2Fapple-store%2Fid1477376905%3Fct%3Dnotification-email%26mt%3D8%26pt%3D524675&data=04%7C01%7C%7C47157397e9e54dc4055408d9faa0d531%7Ca6fa3b030a3c42588433a120dffcd348%7C0%7C0%7C637816390973202195%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=r6KnGCY0ZAsP1Xdu%2Fguy%2F%2BGpQVD7kDPY7NFZimrBE8I%3D&reserved=0> or Android<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fplay.google.com%2Fstore%2Fapps%2Fdetails%3Fid%3Dcom.github.android%26referrer%3Dutm_campaign%253Dnotification-email%2526utm_medium%253Demail%2526utm_source%253Dgithub&data=04%7C01%7C%7C47157397e9e54dc4055408d9faa0d531%7Ca6fa3b030a3c42588433a120dffcd348%7C0%7C0%7C637816390973202195%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=iizKIiDwqhJKbUBwvHEYBjGy5oQVHMOoc33CfSvCGmo%3D&reserved=0>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
Differences are expected - ProMex and MSPathfinder need centroided data, and the raw file reader built in uses the centroiding provided by the Thermo library/raw file, same as the MSConvert vendor centroiding. When reading an mzML created with profile data, it uses either the CWT centroiding (if it can find ProteoWizard DLLs) or a very simplistic local maxima algorithm, both of which will not match the vendor centroiding results.
As for the only 50% overlap, there will be differences in every single peak mass with those different centroiding algorithms; you could look at the differences by opening the files with LCMSSpectator.
…________________________________
From: WinkelsK ***@***.***>
Sent: Wednesday, March 2, 2022 6:21:27 AM
To: PNNL-Comp-Mass-Spec/Informed-Proteomics ***@***.***>
Cc: Subscribed ***@***.***>
Subject: Re: [PNNL-Comp-Mass-Spec/Informed-Proteomics] Analysis of profile data MSPathFinder (Issue #31)
Check twice before you click! This email originated from outside PNNL.
Short feedback: What I have found when working with profile thermo raw files:
Results are the same, when
1. Using raw files as input for PbfGen and Promex
2. Using mzML files (generated via MSConvert, peak picking, centroid, Vendor, see picture)
Results are different (only 50% overlap on Sequences identified) when I use an mzML file in profile mode (generated by MSConvert without any filter) as input for PbfGen and Promex.
[Capture]<https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fuser-images.githubusercontent.com%2F92794723%2F156379507-fc08403d-1369-4eb7-81d1-f1525212cbf1.PNG&data=04%7C01%7Cbryson.gibbons%40pnnl.gov%7C6c5b0c7f2e90466cae6f08d9fc57f4f8%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C637818276980714889%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=F4UbMQyDDK4rpTDtAzcsA6Ot%2B6eVEk5GJaI5hxxB8qI%3D&reserved=0>
—
Reply to this email directly, view it on GitHub<https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FPNNL-Comp-Mass-Spec%2FInformed-Proteomics%2Fissues%2F31%23issuecomment-1056982781&data=04%7C01%7Cbryson.gibbons%40pnnl.gov%7C6c5b0c7f2e90466cae6f08d9fc57f4f8%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C637818276980714889%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=oqxJFzjhe0RmGKWky4E56hbH3demmhF3aUPxokeitk8%3D&reserved=0>, or unsubscribe<https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABPPX5L4XKHXS5R64SLSYCTU552OPANCNFSM5PQTN7CA&data=04%7C01%7Cbryson.gibbons%40pnnl.gov%7C6c5b0c7f2e90466cae6f08d9fc57f4f8%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C637818276980714889%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=pvMqqjKKK%2Fsa3%2FdTW6KWc1xpxzy%2BbJbuovM7FsEKvvY%3D&reserved=0>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
Thanks Bryson! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi all,
I have acquired profile data (MS1 and MS2) on an Thermo instrument.
I have now tested the following two MSPathFinder piplines:
Both workflows run successfully and give a similar number of identifications. BUT the overlap of the identified proteoforms is only 45% (comparing sequences).
I am very unsure which results I can trust.
Looking forward to your feedback!
Cheers,
Konrad
The text was updated successfully, but these errors were encountered: