Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure Ease of Integration into V-Pipe, and Merge #4

Closed
gordonkoehn opened this issue Oct 4, 2024 · 6 comments
Closed

Ensure Ease of Integration into V-Pipe, and Merge #4

gordonkoehn opened this issue Oct 4, 2024 · 6 comments
Assignees

Comments

@gordonkoehn
Copy link
Owner

Check for the ease of running this in the current V-Pipe.

Prepare and submit this as a good PR.

@gordonkoehn gordonkoehn self-assigned this Oct 4, 2024
@gordonkoehn
Copy link
Owner Author

gordonkoehn commented Oct 4, 2024

First thoughts:
The core loops are:

Key Iterations Runtime per iter Memory Potential Reduction
main 1 est. 4.6 h. 1414 MB pd.df_tally ???
location 8 35min for 100 b.s. ~186 MB (df_tally[location]) 1/8
bootstraps (b.s.) 100 min - 1000 optimum 21 s total at 100 iters ~190 MB (df_tally[location] resampled dep. on available cores, say 1/5 - 1/10
date_intervals (#dates - 1) = 12 0.5-6s 7.2872 MB (df_tally[location][resampled][date_interval] startup overhead 1/2 ?
#dates ~13 (one sample per week) XXX XXX

That makes for an estimate of est. 4.6 h total runtime for 8 cities, given they are the same as Zürich. (M1 Pro Chip)

These levels seem to be independent at first sight.

I could imagine that ll.KernelDeconv could be parralized - perhaps easiest would be the most inner looüp for the date_intervals.

In general to use python's multiprocessing the following must be true:

See Co-Pilote:

Using the KernelDeconv class and its deconv method with multiprocessing should generally work, but there are a few considerations to keep in mind:

Thread Safety: Ensure that the objects and methods used within KernelDeconv are thread-safe. This includes the kernel, regressor, and confidence interval objects.

Data Sharing: When using multiprocessing, data is typically copied to each process. If the data is large, this can be inefficient. Consider using shared memory or other techniques to manage large datasets.

Pickleability: Objects passed to multiprocessing must be pickleable. Ensure that all objects and methods used in KernelDeconv can be serialized with pickle.

To Check:

  • Thread Safety – just deep copy
  • Data Sharing – none needed
  • Pickelability – yup

@gordonkoehn
Copy link
Owner Author

gordonkoehn commented Oct 4, 2024

Currently is V-Pipe running LollliPop with the Resources:
threads: 1
memory: 1024 MB
disk_mb 1024

See config_shema.json and rule deconvolution

There is something I don't get about this memory because for my current date range this would mean it would not run as is. No, as V-Pipe seems to run Lollipop already stratified per location.

@gordonkoehn
Copy link
Owner Author

gordonkoehn commented Oct 4, 2024

Conclusion:
Both Multiprocessing at the level of location and bootstraps seems feasible and reasonable.
At the level of location_intervals there is probably to much overhead for the short iteration duration. Multiprocessing at the level or bootstraps would further allow to speed up single location applications.

Using bootstraps as multiprocessing would allow us to fix the number of cores.

The memory would stay within the bounds of a normal 1 GB job, and we would split the single job to about ten jobs max in either case – a reasonable investment of resources.

@gordonkoehn
Copy link
Owner Author

Integration in Snakemake
This should be no problem, we would just need to add a flag to the rule to allow threads=10

@gordonkoehn
Copy link
Owner Author

  • read up on the file where lollipop is integrated into V-Pipe

@gordonkoehn
Copy link
Owner Author

Now going over in the V-Pipe merge here:

cbg-ethz/V-pipe/pull/166

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant