Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelisation support for analyse #435

Merged
merged 12 commits into from
Oct 11, 2024
Merged

Parallelisation support for analyse #435

merged 12 commits into from
Oct 11, 2024

Conversation

gowerc
Copy link
Collaborator

@gowerc gowerc commented Oct 3, 2024

Closes #370

Some general notes:

  • I gave up on future() in the end. I just ran into edge case after edge case that made it too difficult to use and test.
  • I opted for just using parallel but allowing uses to provide their own clusters which can have their functions / libraries already loaded
  • I then created a helper function make_rbmi_cluster() to make this setup process as smooth as possible
  • This also allows for the re-use of clusters between runs to avoid having to repeatedly setup and tear down processes which is very expensive
  • Unfortunately the performance gain was pretty underwhelming, I'm sure it will be better with analysis functions that are more expensive than lm but currently I am only seeing a 30% gain when using large samples e.g. >2000 for less than that it often takes longer to run in parallel. This is mostly due to the IO exchange by the looks of it. R's PSOCK clusters just transfer data exceedingly slowly :(

R/parallel.R Outdated Show resolved Hide resolved
R/parallel.R Outdated Show resolved Hide resolved
@gravesti
Copy link
Collaborator

gravesti commented Oct 4, 2024

That's a pity that future couldn't work. It's limited to PSOCK now, right?
I made a few comments and will test a bit this afternoon.

@gowerc
Copy link
Collaborator Author

gowerc commented Oct 4, 2024

Added some additional changes, main one here is that I change analyse to transfer data into the sub-processes via disk instead of via network. I must admit I am a little unsure on this as I'm not sure how reliable parallel file reads are, though seems to work fine in my testing so far ...

I also added a .validate argument to skip the input validation tests that were taking a surprising amount of time

Co-authored-by: Isaac Gravestock <[email protected]>
Signed-off-by: Craig Gower-Page <[email protected]>
@gowerc
Copy link
Collaborator Author

gowerc commented Oct 7, 2024

@gravesti - It's just dawned on me that the usage of RDS files to load data into the parallel processes means we are limiting parallelisation to only function on local machines. I think (at least from the documentation) that PSOCK supports running parallel processes on remote machines which wouldn't work in this case. The RDS loading saves ~3/4 seconds so I'm not sure wether to switch back to network loading or to just put a big warning in the documentation pages or provide a toggle-able option.

@gravesti
Copy link
Collaborator

gravesti commented Oct 7, 2024

@gowerc Have we been testing on a realistic analysis size? If so, then maybe it is ok to be limited to a single machine and just have a warning in the docs.

I guess the same argument applies to not using RDS. Accept the few seconds penalty for less complexity and no restrictions or extra arguments.

@gowerc
Copy link
Collaborator Author

gowerc commented Oct 7, 2024

Talking to Marcel and Alessandro I think a realistic sample size would be in the order of 1000-2000 e.g. our test code is likely in the right region (e.g. 20-30 seconds in sequential). Though I think a common use case for this will be in tipping point analyses where the same call to analyse() is run 20-40 times with slightly different offsets. This is an awkward one where the answer is really "it depends...".

I think my current assumption would be that people who are using rbmis internal paralisation are likely just speeding up individual runs on their local machines. If a user is getting to the stage where they need to run it across different nodes (say for multiple sensitivity analyses on different sub-endpoints) they are more likely going to be not using rbmi's internal parallisation and instead be parallelising the different runs of rbmi externally to rbmi.

Given how inefficient the internal parallelisation of rbmi is for analyse() the idea of recruiting a whole cluster of nodes seems ridiculous so I guess I'd rather support the optimisation on a local machine.

Then again I fear people will just assume this functionality is fully compatible with parallels remote PSOCK setup and run it anyway, e.g. we are breaking that interface...

I think on balance I'm inclined to leave it as is and add a warning note in the documentation.

@gravesti
Copy link
Collaborator

@gowerc Will you merge the lintr branch first? Do you still need to update the docs in this PR?

@gowerc
Copy link
Collaborator Author

gowerc commented Oct 11, 2024

@gravesti - Have merged in the lintr PR and have added an additional warning regarding the psock network stuff.

Copy link
Collaborator

@gravesti gravesti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the effort on this one @gowerc !

@gowerc
Copy link
Collaborator Author

gowerc commented Oct 11, 2024

Urgh, just seen theres a bunch of notes that I need to resolve:

* checking R code for possible problems ... NOTE
analyse : <anonymous> : inner_fun: no visible binding for global
  variable ‘..rbmi..analysis..imputations’
analyse : <anonymous> : inner_fun: no visible binding for global
  variable ‘..rbmi..analysis..delta’
analyse : <anonymous> : inner_fun: no visible global function
  definition for ‘..rbmi..analysis..fun’
Undefined global functions or variables:
  ..rbmi..analysis..delta ..rbmi..analysis..fun
  ..rbmi..analysis..imputations

@gowerc
Copy link
Collaborator Author

gowerc commented Oct 11, 2024

Ok I think that last commit should address the NOTE

@gowerc gowerc merged commit 43639fc into main Oct 11, 2024
5 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Oct 11, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
2 participants