Ensure Ease of Integration into V-Pipe, and Merge #4

gordonkoehn · 2024-10-04T07:50:59Z

Check for the ease of running this in the current V-Pipe.

Prepare and submit this as a good PR.

gordonkoehn · 2024-10-04T08:16:31Z

First thoughts:
The core loops are:

Key	Iterations	Runtime per iter	Memory	Potential Reduction
main	1	est. 4.6 h.	1414 MB pd.df_tally	???
location	8	35min for 100 b.s.	~186 MB (df_tally[location])	1/8
bootstraps (b.s.)	100 min - 1000 optimum	21 s total at 100 iters	~190 MB (df_tally[location] resampled	dep. on available cores, say 1/5 - 1/10
date_intervals	(#dates - 1) = 12	0.5-6s	7.2872 MB (df_tally[location][resampled][date_interval]	startup overhead 1/2 ?
#dates	~13 (one sample per week)	XXX	XXX

That makes for an estimate of est. 4.6 h total runtime for 8 cities, given they are the same as Zürich. (M1 Pro Chip)

These levels seem to be independent at first sight.

I could imagine that ll.KernelDeconv could be parralized - perhaps easiest would be the most inner looüp for the date_intervals.

In general to use python's multiprocessing the following must be true:

See Co-Pilote:

Using the KernelDeconv class and its deconv method with multiprocessing should generally work, but there are a few considerations to keep in mind:

Thread Safety: Ensure that the objects and methods used within KernelDeconv are thread-safe. This includes the kernel, regressor, and confidence interval objects.

Data Sharing: When using multiprocessing, data is typically copied to each process. If the data is large, this can be inefficient. Consider using shared memory or other techniques to manage large datasets.

Pickleability: Objects passed to multiprocessing must be pickleable. Ensure that all objects and methods used in KernelDeconv can be serialized with pickle.

To Check:

Thread Safety – just deep copy
Data Sharing – none needed
Pickelability – yup

gordonkoehn · 2024-10-04T09:45:55Z

Currently is V-Pipe running LollliPop with the Resources:
threads: 1
memory: 1024 MB
disk_mb 1024

See config_shema.json and rule deconvolution

There is something I don't get about this memory because for my current date range this would mean it would not run as is. No, as V-Pipe seems to run Lollipop already stratified per location.

gordonkoehn · 2024-10-04T11:20:05Z

Conclusion:
Both Multiprocessing at the level of location and bootstraps seems feasible and reasonable.
At the level of location_intervals there is probably to much overhead for the short iteration duration. Multiprocessing at the level or bootstraps would further allow to speed up single location applications.

Using bootstraps as multiprocessing would allow us to fix the number of cores.

The memory would stay within the bounds of a normal 1 GB job, and we would split the single job to about ten jobs max in either case – a reasonable investment of resources.

gordonkoehn · 2024-10-04T12:57:10Z

Integration in Snakemake
This should be no problem, we would just need to add a flag to the rule to allow threads=10

gordonkoehn · 2024-10-10T11:09:27Z

read up on the file where lollipop is integrated into V-Pipe

gordonkoehn · 2024-10-11T12:10:42Z

Now going over in the V-Pipe merge here:

cbg-ethz/V-pipe/pull/166

gordonkoehn self-assigned this Oct 4, 2024

gordonkoehn closed this as completed Oct 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure Ease of Integration into V-Pipe, and Merge #4

Ensure Ease of Integration into V-Pipe, and Merge #4

gordonkoehn commented Oct 4, 2024

gordonkoehn commented Oct 4, 2024 •

edited

Loading

gordonkoehn commented Oct 4, 2024 •

edited

Loading

gordonkoehn commented Oct 4, 2024 •

edited

Loading

gordonkoehn commented Oct 4, 2024

gordonkoehn commented Oct 10, 2024

gordonkoehn commented Oct 11, 2024

Ensure Ease of Integration into V-Pipe, and Merge #4

Ensure Ease of Integration into V-Pipe, and Merge #4

Comments

gordonkoehn commented Oct 4, 2024

gordonkoehn commented Oct 4, 2024 • edited Loading

gordonkoehn commented Oct 4, 2024 • edited Loading

gordonkoehn commented Oct 4, 2024 • edited Loading

gordonkoehn commented Oct 4, 2024

gordonkoehn commented Oct 10, 2024

gordonkoehn commented Oct 11, 2024

gordonkoehn commented Oct 4, 2024 •

edited

Loading

gordonkoehn commented Oct 4, 2024 •

edited

Loading

gordonkoehn commented Oct 4, 2024 •

edited

Loading