-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelism in Learning Networks #739
Comments
Thanks @olivierlabayle for looking into yet another interesting area for enhancement. It's exciting to hear you may be interested in helping out here! Re the multithreading. Thanks for this POC! What you suggest is precisely what I had in mind when I carried out a big refactor of learning network training some time ago to make training "ansynchronous". However, I think we should enlist the guidance of someone strong in this area, as I think it's easy to implement multi-threading in unsafe ways, which is why I stopped short. I have already mentioned this project to @OkonSamuel, who would be ideal, but he is quite busy just now. As you note, there is also the issue of a user interface point for Re the distributed computing, we actually had the maintainer of Dagger.jl @jpsamaroo look into this. However, this was before the big refactor, which made this an ambitious undertaking, and it was ultimately unsuccessful. Perhaps he would be willing to revisit it given the refactoring, especially if you are available to slog out some of the details. Note that for both multi-threading and distributed computing the existing testing is already asynchronous. That is, we have tests that various parts of the network do what they are expected to do, but we do not insist nodes in parallel execute in a particular order. What will be important is to add tests that show outcomes are independent of the |
BTW, if you are interested in pushing the POC a little further, you could already do some testing to check multithreaded stacks and CPU1 stacks give the same answers (assuming no RNGs!). |
Thanks for the ping @ablaom! Dagger has grown in capability and scope since I first tried to implement parallel training, and I also have a much better handle on how to safely do distributed programming in Julia. However, it's also been a while since I've looked at MLJ, so I'd definitely need to spend time wrapping my head around how models are initialized and how data moves around. It's possible that we can do less implementation work in MLJ and instead allow a user to wrap MLJ calls with Dagger API calls (like Anyway, I don't have much time to do this right now, but if anyone wants to give this a shot without waiting on me, I'd be happy to help provide guidance! Just ping me if you run into trouble or have questions, or file issues on the Dagger repo. |
@jpsamaroo Thanks for the quick response and update. Things look promising. I suggest that if and when @olivierlabayle is ready to look at the distributed case, we have all have a call to get your best advice. |
For the record, here is the original, open (but quite stale) issue for adding distributed computing via Dagger: JuliaAI/MLJ.jl#72 |
@ablaom @jpsamaroo Thanks for your instructive replies! You seem to make a distinction between the multithreaded and the distributed version. From the README, I had the impression that Dagger actually abstracts this away "It can run computations represented as DAGs efficiently on many Julia worker processes and threads, as well as GPUs". Isn't that correct (I have never used Dagger before)? I initially thought we could represent learning networks as Dagger DAGs or something like that and then we would be "done". Given my current personnal workload/deadlines I could only envision to push the current POC in the near future if that is deemed useful. However, if Dagger does indeed abstract the resource representation away and nobody else has taken the subject in the meantime I would be very happy to give it a try in a few months when it is a bit quieter for me. |
I have been playing around a bit with Dagger.jl today, I think the following represents some kind of proof of concept that in theory it could work! The main caveat is that with this approach, I don't currently see how to do it without breaking everything. My idea is as follows: Create a
As you can see I have also played with a composite of composite to check it was running fine. At first glance, the repercusions I see are:
Of course I haven't done anything really here, since all the complexity will lie in the not provided macro. I just wanted to have your opinion before moving forward since it represents a big piece of work and potentially not in line with your perspectives for MLJBase. Happy to discuss more in detail over a call! using Pkg
Pkg.activate(".")
using Dagger
using DataFrames
using MLJBase
using MLJLinearModels
struct MyModel <: MLJBase.Model
model₁
model₂
end
"""
We can define a macro as it is done in most probabilistic programming languages.
The user could define something like this:
@composite function userdefined(mach::Machine{MyModel, C}) where C
X₁, X₂, y = mach.args()
mach₁ = machine(mach.model.model₁, X₁, y, cache=C)
y₁ = predict(mach₁)
mach₂ = machine(mach.model.model₂, X₂, y, cache=C)
y₂ = predict(mach₂)
@register ypred = (y₁ + y₂) ./ 2, :predict
@register mean_ = mean(ypred), :mean
@register var_ = var(ypred), :var
end
"""
"""
From the previous chunk of code we can generate a fit method from the
computational graph that would result here in:
"""
function MLJBase.fit!(mach::Machine{MyModel, C}; verbosity=0) where C
X₁, y = (src() for src in mach.args)
mach₁ = machine(mach.model.model₁, X₁, y, cache=C)
mach₁ = Dagger.spawn(m -> fit!(m, verbosity=verbosity), mach₁)
y₁ = Dagger.spawn(predict, mach₁)
mach₂ = machine(mach.model.model₂, X₁, y, cache=C)
mach₂ = Dagger.spawn(m -> fit!(m, verbosity=verbosity), mach₂)
y₂ = Dagger.spawn(predict, mach₂, X₁)
ypred = Dagger.spawn(+, y₁, y₂)
mean_ = Dagger.spawn(mean, ypred)
# Encapsulate in a return!
mach.fitresult = (machines=[fetch(mach₁), fetch(mach₂)], mean=fetch(mean_))
return mach
end
"""
For each registered operation in OPERATIONS, generate the corresponding method from the graph.
Here only X₁ and X₂ are required for the prediction so only them are included in the signature.
"""
function MLJBase.predict(mach::Machine{MyModel,}, X₁)
mach₁ = mach.fitresult.machines[1]
y₁ = Dagger.spawn(predict, mach₁, X₁)
mach₂ = mach.fitresult.machines[2]
y₂ = Dagger.spawn(predict, mach₂, X₁)
ypred = Dagger.spawn(+, y₁, y₂)
return fetch(ypred)
end
###### Data
n = 1000
X₁ = MLJBase.table(rand(n, 3))
y = rand(n)
C = false
###### Machine
mymodel = MyModel(LinearRegressor(), RidgeRegressor(lambda=1))
mach = machine(mymodel, X₁, y, cache=C)
fit!(mach, verbosity=1)
predict(mach, X₁)
###### Composite of composite: hangs forever
newmodel = MyModel(mymodel, LinearRegressor())
mach = machine(newmodel, X₁, y, cache=C)
fit!(mach, verbosity=1) |
Hi,
I would be quite keen on having parallel training for learning networks.
I have seen that there may be a plan to use Dagger.jl in this issue for instance and wanted to know if it was still in the scope?
On a much less ambitious note, I have played around a bit and it seems that the following enables multithreaded fitting. One downside is that the
acceleration
cannot be provided in thefit!
function, the user has to calldefault_resource(CPUThreads())
. The good news is that it seems quick and easy to provide this additional feature. I am by no mean knowledgeable of parallel computing so if there is anything wrong with it please let me know. For instance, I am not even sure how to test that implementation.The text was updated successfully, but these errors were encountered: