-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimisation of resources for the workflow #455
Comments
My plan was always to use the results of your thousands of runs to learn a simple regression model for each step based on file size and or number of spectra. But I am not sure if you ever saved the execution logs. |
I did it for most of the runs. However, you don't really need a huge data to be able to learn simple things. Some conclusions, easy ones:
|
Well, yes, but I wasnt talking about those easy things. Of course you can add smaller labels for those. |
I think the orher ones depends heavily of the mzML size, number of MS and MS/MS I guess, even type of instrument, or file size. |
That's why I said learning from your results.. |
All this information is available when starting a run.. |
Would be a unique and potentially publishable feature of the pipeline. |
Yes, the idea is to optimize the pipeline for each process for 80% of the runs, if the 20 fails, it can go to the next retry. Before doing the research we have to think if is needed to have the info inside the files, MS and MS/MS. because if for the model that information is needed, then we will need to block all process until mzml_statistics finish? |
Yes that is true. |
I will argue that in the first iteration, we look for simple variables, |
I think we cannot predict CPU usage. We need to know from the implementation if it benefits from multiple cores. |
You can also subsample and average statistics from some files to get a much better idea. |
Description of feature
Currently, quantms have seven major categories for resources or processes:
However, some of my current analyses showed that resource usage, for example, for DIA analysis, could be optimized much more at the process level. See some results from my analyses.
### Dataset: PXD030304
CPU Usage:
Memory Usage:
IO Usage:
Most of the processes are under 50% of usage of memory and CPU which looks like a waist of resources?
The text was updated successfully, but these errors were encountered: