Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make jacobian work with CuArrays #33

Closed
wants to merge 4 commits into from
Closed

Conversation

tkf
Copy link

@tkf tkf commented May 27, 2019

It looks like Tracker.jacobian does not support CuArrays at the moment:

julia> Tracker.jacobian(identity, gpu([0.0, 0.0]))
ERROR: GPU compilation of #23(CuArrays.CuKernelState, CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}, Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(+),Tuple{Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},Tuple{Bool},Tuple{Int64}},Base.Broadcast.Extruded{Array{Bool,1},Tuple{Bool},Tuple{Int64}}}}) failed KernelError: passing and using non-bitstype argument

Argument 4 to your kernel function is of type Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(+),Tuple{Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},Tuple{Bool},Tuple{Int64}},Base.Broadcast.Extruded{Array{Bool,1},Tuple{Bool},Tuple{Int64}}}}.
That type is not isbits, and such arguments are only allowed when they are unused by the kernel.  .args is of type Tuple{Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},Tuple{Bool},Tuple{Int64}},Base.Broadcast.Extruded{Array{Bool,1},Tuple{Bool},Tuple{Int64}}} which is not isbits.
    .2 is of type Base.Broadcast.Extruded{Array{Bool,1},Tuple{Bool},Tuple{Int64}} which is not isbits.
      .x is of type Array{Bool,1} which is not isbits.

This is because ȳ(i) always return Vector{Bool}:

Tracker.jl/src/back.jl

Lines 174 to 178 in 593aba6

function jacobian(f, x::AbstractVector)
y::AbstractVector, back = forward(f, x)
(i) = [i == j for j = 1:length(y)]
vcat([transpose(back((i))[1]) for i = 1:length(y)]...)
end

A fix I found (implemented in this PR) was to create the output array using similar(y, Bool):

julia> using Tracker: forward

julia> function jacobian(f, x::AbstractVector)
           y::AbstractVector, back = forward(f, x)
           function ȳ(i)
               δ = fill!(similar(y, Bool), false)
               δ[i] = true
               return δ
           end
           vcat([transpose(back(ȳ(i))[1]) for i = 1:length(y)]...)
       end
jacobian (generic function with 1 method)

julia> jacobian(identity, gpu([0.0, 0.0]))
2×2 CuArray{Float32,2}:
 1.0  0.0
 0.0  1.0

I didn't added any tests (sorry) because Tracker's test suite does not load CuArrays.

@MikeInnes
Copy link
Member

So the reason Jacobian is written this way is so that nested AD works, and this PR will likely break that (ideally we'd test it explicitly). I think you can get the best of both if you find a way to write this as map.

You can add GPU tests in the cuda folder and they'll run on bors, or get loaded when you have CuArrays installed locally.

@tkf
Copy link
Author

tkf commented May 28, 2019

map didn't work but broadcast worked. However, I couldn't construct an example where nested AD fail with the fill! approach (see below). Also, I have a vague idea how in-place operation breaks the AD but I don't see why it matters in because its output does not depend on the values of x (so it does not participate in the computation graph). Or maybe the function I tried was too easy for the AD to handle?

julia> function jacobian_with_fill(f, x::AbstractVector)
         y::AbstractVector, back = forward(f, x)
         function (i)
           δ = fill!(similar(y, Bool), false)
           δ[i] = true
           return δ
         end
         vcat([transpose(back((i))[1]) for i = 1:length(y)]...)
       end
jacobian_with_fill (generic function with 1 method)

julia> Tracker.gradient(x -> sum(jacobian_with_fill(y -> y .^ 2, x) .^ 2), [1.0, 2.0, 3.0])
([8.0, 16.0, 24.0] (tracked),)

julia> Tracker.gradient(x -> sum(jacobian_with_fill(y -> sum(y) .* y, x) .^ 2), [1.0, 2.0, 3.0])
([66.0, 72.0, 78.0] (tracked),)

julia> function jacobian_with_bc(f, x::AbstractVector)
         y::AbstractVector, back = forward(f, x)
         (i) = ((j, _) -> i == j).(1:length(x), x)
         vcat([transpose(back((i))[1]) for i = 1:length(y)]...)
       end
jacobian_with_bc (generic function with 1 method)

julia> Tracker.gradient(x -> sum(jacobian_with_bc(y -> y .^ 2, x) .^ 2), [1.0, 2.0, 3.0])
([8.0, 16.0, 24.0] (tracked),)

julia> Tracker.gradient(x -> sum(jacobian_with_bc(y -> sum(y) .* y, x) .^ 2), [1.0, 2.0, 3.0])
([66.0, 72.0, 78.0] (tracked),)

You can add GPU tests in the cuda folder

I don't see cuda directory here: https://github.com/FluxML/Tracker.jl/tree/v0.2.2/test Maybe you are mixing it Flux's test suite?

@MikeInnes
Copy link
Member

Of course, I always forget we split tracker out of Flux. In that case it'd still be nice to have a test on the Flux side once this is in.

You are probably right that this can be done with the similar approach; the main issue I was thinking of was similar returning a tracked array or array of tracked reals, which has been proposed at some point, but isn't actually the case yet (and could be avoided with a call to data(y)). That said, the broadcast version avoids a warning about scalar indexing and does the job with a single kernel, which is always good.

@tkf
Copy link
Author

tkf commented May 30, 2019

Thanks for the clarification. The latest commit in this PR uses broadcasting approach. So it's good to go?

In that case it'd still be nice to have a test on the Flux side once this is in.

See FluxML/Flux.jl#782

@tkf
Copy link
Author

tkf commented May 30, 2019

I just found a related problem: jacobian(softmax, ones(3)) doesn't work because NNlib.jl defines the derivative by ∇softmax(Δ, xs) = ∇softmax!(similar(Δ), Δ, xs). Would it be safe to convert 's element type like this?

z = float(zero(eltype(data(y))))
(i) = ((j, _) -> i == j).(1:length(y), y) .+ z

Or should it be fixed in NNlib.jl?

Edit: implemented in 9c2c05e

@tkf
Copy link
Author

tkf commented May 30, 2019

It looks like the tests in Flux failed because of allowscalar(false). I suppose indexing is disabled in Flux's test suite because indexing on GPU is expensive? Also, from the stack trace, it looks like indexing into GPU array happens because it hits the generic indexing-based broadcasting in Base.

If these are the case, wouldn't it be better to use the fill!-based approach so that there are only N indexing rather than N^2 indexing that would happen in the broadcasting approach? (N = length(y))

@@ -173,7 +173,11 @@ Calculate the output jacobian `J = d/dx m(x)` such that each row `i` of `J` corr
"""
function jacobian(f, x::AbstractVector)
y::AbstractVector, back = forward(f, x)
ȳ(i) = [i == j for j = 1:length(y)]
function ȳ(i)
δ = fill!(float(similar(data(y))), false)
Copy link
Member

@ChrisRackauckas ChrisRackauckas Aug 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

δ = false .* y .+ (1:length(y) .== i). Though you should then make it be color_i and loop through colors for it to be sparse, but this is good for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll open a fresh PR for this.

@tkf
Copy link
Author

tkf commented Aug 27, 2019

Closing in favor of #44 (thank you @ChrisRackauckas!)

@tkf tkf closed this Aug 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants