Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

outdims function doesn't work properly for chained layers #1086

Closed
HamletWantToCode opened this issue Mar 16, 2020 · 35 comments · Fixed by #1305
Closed

outdims function doesn't work properly for chained layers #1086

HamletWantToCode opened this issue Mar 16, 2020 · 35 comments · Fixed by #1305

Comments

@HamletWantToCode
Copy link

HamletWantToCode commented Mar 16, 2020

In PR #960 outdims function is added to Flux.jl and makes it easier for us to infer the output dimension for our neural network model ( especially when using convolutional layer ). However, the results of outdims function is incorrect for chained layers, and existing test cases fail to reflect this flaw. e.g. Try following code:

using Flux
using Flux: outdims

all_dense_model = Chain(Dense(2, 10), Dense(10, 4))
outdims(all_dense_model, 2)    # will return (10,) while it should be (4,)

I'm using Flux v0.10.3 in Julia v1.3.0 under macOS

Currently, outdims is implemented as:

outdims(c::Chain, isize) = foldl(, map(l -> (x -> outdims(l, x)), c.layers))(isize)

however, foldl doesn't give correct function composition, e.g. Try following code:

f1(x) = 2x; f2(x) = exp(x); f3(x) = x^3
const x0 = 0.5
# we want to compute f3∘(f2∘f1(x))
f_true = f3f2f1
f_true(x0)   # 20.085536923187664
# however
f = foldl(, (f1, f2, f3))   # the returned f = ((f1∘f2)∘f3) following left association rule !!!
f(x0) # 2.2662969061336526

I suggest to resolve this problem by the following modifications:

# edit
using Base: tail # thanks to @darsnack for pointing this out

outdims(t::Tuple, isize) = outdims(tail(t), outdims(first(t), isize))
outdims(c::Chain, isize) = outdims(c.layers, isize)

and add on following test cases:

@testset "dense layer" begin
         X = randn(Float32, 5, 10)
         D0, D1, D2 = 5, 100, 25
         dense1 = Dense(D0, D1, relu)
         dense2 = Dense(D1, D2)
         dense_chain = Chain(dense1, dense2)
         @test outdims(dense1, D0) = (D1,)
         @test first(outdims(dense1, D0)) == size(dense1(X), 1)
         @test first(outdims(dense_chain, D0)) == size(dense_chain(X), 1)
end

@testset "conv layer" begin
            X = randn(Float32, 28, 28, 1, 1)
            D0, D1 = 3, 5
            S, P = 3, 1
            conv1_stride = Conv((D0, D0), 16=>32, stride=S, pad=P)
            conv2 = Conv((D1, D1), 3=>16)
            conv_chain = Chain(conv1_stride, conv2)
            @test typeof(outdims(conv1_stride, (28, 28))) <: Tuple
            @test length(outdims(conv1_stride, (28, 28))) == 2
            @test outdims(conv1_stride, (28, 28)) == size(conv1_stride(X))[1:2]
            @test outdims(conv_chain, (28, 28)) == size(conv_chain(X))[1:2]
end

Maybe we could also consider extend outdims function to more complex chained model, e.g. model that contain normal julia function, such as x->reshape(x, :, 4) , and also write outdims for recurrent layers.

I'll collect these into a PR if you'd like to @MikeInnes @baggepinnen @darsnack .

@DhairyaLGandhi
Copy link
Member

Sounds like a good idea to collect all the failing test cases, having something like outdims(::type of(reshape), ...) might be fine too

@mcabbott
Copy link
Member

Can it be taught to understand x->reshape(x, :, 4)? Perhaps people would have to write Base.Fix2(reshape, (:,4)). Right now Flux.outdims(x -> reshape(x,:,4), 5) is an error, which is good, but perhaps a more helpful one would be better.

BTW why is Flux.outdims(Dense(2,5), (3,10)) == (5,)? I would have expect a 2-tuple, and probably an error!

@HamletWantToCode
Copy link
Author

HamletWantToCode commented Mar 17, 2020

Hi @mcabbott

why is Flux.outdims(Dense(2,5), (3,10)) == (5,)

Current implementation of outdims for Dense layer ( see below ) doesn't really depend on the 2nd argument you passed in, it will just ignore (3, 10) and return size(l.W)[1].

outdims(l::Dense, isize) = (size(l.W)[1],)

I think this can be solved by

function outdims(l::Dense, isize::Tuple)
    isize == (size(l.W, 2),) || throw(DimensionMismatch("input size should equal to $((size(l.W)[2],)), got $isize"))
    return (size(l.W, 1),)
end

I prefer to throw an error instead of return a 2-tuple.

@mcabbott
Copy link
Member

OK, so outdims does not propagate the size of tensors, it only cares about the first one or two dimensions, depending on the layer. Except that sometimes it checks the others?

julia> Conv((3, 3), 3 => 16)(ones(10, 10, 3, 1)) |> size
(8, 8, 16, 1)

julia> Flux.outdims(Conv((3, 3), 3 => 16), (10, 10, 3, 1))
(8, 8)

julia> Flux.outdims(Conv((3, 3), 3 => 16), (10, 10)) # so it silently ignores 3rd & 4th?
(8, 8)

julia> Flux.outdims(Conv((3, 3), 3 => 16), (10, 10, 10, 1)) # no it doesn't!
ERROR: DimensionMismatch("Input channels must match! (10 vs. 3)")

julia> Flux.outdims(Dense(8, 16), (8,8,16,1)) # but here it ignores everything...
(16,)

Perhaps this is orthogonal to your question, sorry, but at very least I think the docstring should explain what this function is for, "output dimensions" is pretty vague. Why it isn't just outsize(f, s::Tuple) = size(f(ones(Float32, s...))), or faster versions of that? Then x->reshape(x, :, 4) would just work, whereas at present the output seems to depend on things that the layer before will have dropped.

@HamletWantToCode
Copy link
Author

HamletWantToCode commented Mar 17, 2020

Hi @mcabbott

it only cares about the first one or two dimensions, depending on the layer. Except that sometimes it checks the others?

I also think so.

Why it isn't just outsize(f, s::Tuple) = size(f(ones(Float32, s...)))

For the whole network, I think this may results in large overhead, since every time one need to run the complete computation for it's output dimension. However, I think this may be a good idea for functions like x->reshape(x, :, 4), we can just define something like:

outdims(f::typeof(reshape), s::Tuple) = size(f(ones(Float32, s...)))

since this won't cost much.

@DhairyaLGandhi
Copy link
Member

DhairyaLGandhi commented Mar 17, 2020

Or just return the shape the reshape is wants to reshape it to? Accounting for any Colons, of course

@mcabbott
Copy link
Member

I don't think x->reshape(x, :, 4) has a type you can recognise, hence my Base.Fix2 comment above.

What I meant about outsize was that this would be a simple rule for what the function should return, and a fallback implementation, even if it gets skipped on known types such as Conv. But this rule is quite different to what outdims presently returns. Its action on Conv means that it cannot distinguish these two cases:

julia> m = Chain(Conv((3, 3), 3 => 16), Base.Fix2(reshape, (:,4)));

julia> m(ones(10,10,3,1)) |> size
(256, 4)

julia> m(ones(10,10,3,2)) |> size
(512, 4)

@DhairyaLGandhi
Copy link
Member

Yes, you are right. The more I think about it, the more I feel for this to work generically and correctly, it would need to do a true forward pass, which sounds like too much ado for something one expects to be trivial and quick. Writing all these specific rules feels incorrect too and won't scale.

One could always have something that does a true forwards pass only as a fallback and only return the size of the output to the next layer, but that feels inelegant too.

@MikeInnes MikeInnes linked a pull request Mar 17, 2020 that will close this issue
@MikeInnes
Copy link
Member

Yeah, this is an inherent problem with this kind of API. We definitely don't want to go down the route of supporting things like x -> reshape(x, ...) and so on.

Probably best to recommend that people just make an array and throw it at the model to see what happens, which is the only really reliable source of truth.

@mcabbott
Copy link
Member

mcabbott commented Mar 17, 2020

Having a slowish fallback plus fast shortcuts doesn't seem that inelegant to me. But perhaps I'm not so sure what this is for anyway. Is it for interactive use to check you've defined the model how you think you did? If so, is the speed of simply running it once a big deal?

The present design seems pretty confusing though, it does not seem clearly specified what you should provide, or are going to get back.

@DhairyaLGandhi
Copy link
Member

DhairyaLGandhi commented Mar 17, 2020

I think either the use case for this needs to be something more concrete where a forwards pass is untenable, or it's implantation needs to be correct, simple and reliable. Perhaps good to clear that bit first?

@darsnack
Copy link
Member

darsnack commented Mar 17, 2020

The original PR was supposed to address programmatic layer building for large NNs. If you are putting together a series of convolutions, then you need to keep track of the (width, height) of each layer's output so that you can size the final dense layers properly. Based on Slack chatter, it was clear people kept defining this in their own code so it made sense to provide a utility function in Flux.

But perhaps I'm not so sure what this is for anyway.

The only practical use case originally envisioned is what I highlighted above. Thus, it only needs to be there for conv-type layers. When doing the original PR, we thought why don't we provide it for other layers too to be consistent. That's why dense, etc. were added.

To address the input-output relationship. The intent is that you put in (width, height) and get out (width, height). Probably it is a better design to return the full tensor size.

As for the errors, we should be returning errors in all dimension mismatch cases. The reason it does for convolutions and not for dense is because the convolution layers utilize the underlying NNlib functions which throw a dimension mismatch.

Probably best to recommend that people just make an array and throw it at the model to see what happens, which is the only really reliable source of truth.

Since the utility of these functions is during model building, we want fast responses.

@darsnack
Copy link
Member

Also, based on the utility described above, I would argue that this function should be restricted to NN layers. I would suggest keeping it defined to dense/conv functions only, and defining a fallback method that determines the output size by evaluating the function. And we should be explicit in the docs that this is what happens for all types not listed in the docs. I am not sure evaluating reshape is considered slow compared to evaluating a series of convolution layers.

Of course, we would include the consistency fixes to the API such as returning the full output tensor size and throwing errors.

@mcabbott
Copy link
Member

OK, sounds sensible. If it's going to be defined for both Conv and Dense, then it has to have some way of digesting reshape as you must be mixing 4-tensors and 2-tensors.

Perhaps for cases where you'd like to think only about height/width, not all dimensions, it should do something like outdims(Conv((3, 3), 3 => 16), (10, 10)) == (8, 8, missing, missing).

@darsnack
Copy link
Member

Okay I see now that the docstring is broken for the original PR. I'm sorry, that was my first PR involving docs and I must have messed something up.

The docstring actually addresses a lot of what we've been talking about. For example, the convolution docstring says:

Calculate the output dimensions given the input dimensions, isize.
Batch size and channel size are ignored as per NNlib.jl.

And in the case of the dense, it has an example explicitly showing that the input tuple is effectively ignored even in the case of a dimension mismatch.

That being said, I think the API changes here are better anyways.

@MikeInnes
Copy link
Member

I think it would be fine to ignore the batch dimension if it's not present (no missing) and check it if it is. The use case for outdims sounds reasonable so let's keep it to simple layers but also clearly document a pattern for more complex models (i.e. constructing and passing an array of nonsense values).

It'd also be fine to add a Reshape layer for reshaping, which outdims could be aware of (in a separate PR).

@mcabbott
Copy link
Member

While propagating only the first two dimensions seems fine for Chain(Conv, MaxPool, Conv) (although it should be clearly documented that isize does not necessarily mean size, etc.), it's likely to be ambiguous for Chain(Conv, Reshape, Dense) if the reshape could really accept a matrix too. With missing it could warn you. Otherwise it could be restricted only to layers that take exclusively 4-tensors, which perhaps was the original idea.

@MikeInnes
Copy link
Member

For cases like that it would probably make sense for Conv to always return the number of channels at least (which is easy if it is not provided, since it's fixed); and that extra output can easily be ignored if it's not needed.

For the batch dimension we can just follow Julia's regular semantics, i.e. assume it's equal to 1 if it's needed. If your Reshape is not generic over batch dimension you'll have to deal with that the same way you would with a regular array.

@HamletWantToCode
Copy link
Author

OK, just summarize what have been discussed:

  1. The outdims function should be able to return output dimension ( except for batch dimension ) of a neural network model.

  2. We don't expect to actually run the whole model when we use outdims.

  3. The outdims function should be specifically defined for Flux's layers, such as Dense & Conv, while allow a fallback for others ( such as reshape ).

Here is what I think it could be:

# fallback
outdims(f::Function, isize::Tuple) = size(f(ones(Float32, isize..., 1)))[1:end-1]    # since we aren't care about batch dimension, we are free to just set it to 1

# Dense layer
function outdims(l::Dense, isize::Tuple)
    isize == (size(l.W, 2),) || throw(DimensionMismatch("input size should equal to $((size(l.W)[2],)), got $isize"))
    return (size(l.W, 1),)
end

# Conv layer
outdims(::typeof(Conv), isize) = ...

# Chain
using Base: last

outdims(t::Tuple, isize) = outdims(last(t), outdims(first(t), isize))
outdims(c::Chain, isize) = outdims(c.layers, isize)

@mcabbott
Copy link
Member

About reshape, yes I guess the example we ought to have had in this thread is x -> reshape(x, :, size(x,4)) not (x, :, 4).

I still think it would be easier for this thing to always treat the actual sizes, including batch dimensions of 1 if necessary, rather than have to explain why it deals with 3-tuples for layers that demand 4-tensors. Perhaps outdims(Conv((3, 3), 3 => 16), (10, 10)) == (8, 8, 16, 1), allowing you to run what you had before.

@darsnack
Copy link
Member

I think isize should include all the dimensions that would be there as input to that layer with the exception that batch size can be left off. In this case, we can default to 1. If we assume 1, then that propagates down through the chain nicely. And remembering the model building use-case, we don't want users to have to specify batch since the model architecture is independent of batch size.

I guess the best way to describe it is that the batch dimension is always "pass-through." Defaults to 1 and preserved as-is if user-specified.

@MikeInnes
Copy link
Member

MikeInnes commented Mar 17, 2020

rather than have to explain why it deals with 3-tuples for layers that demand 4-tensors

The right way to fix this is to make the conv layers better at accepting input that doesn't have a trailing batch dimension. That they don't currently is really just the implementation leaking out, but it's easy to do.

I do agree that the batch dimension shouldn't be a special case, though. outdim should behave pretty much exactly like size(model(rand(in...))) or it's going to be difficult for layer authors to implement it correctly. That definition should support the "pass-through" semantics well because the layers already do that anyway.

@mcabbott
Copy link
Member

I see there's a commented out line to allow Conv without a batch dim:

# TODO: breaks gpu broadcast :(
# ndims(x) == ndims(c.weight)-1 && return squeezebatch(c(reshape(x, size(x)..., 1)))

My GPU is self-isolating but is this still a problem?

@MikeInnes
Copy link
Member

Someone just needs to figure out how to write it so that Julia's closure conversion is happy, which should be pretty easy to do.

@hhaensel
Copy link
Contributor

Stumbled across the outdims issue for chains today.
The reason why the current implementation fails is that the order of function composition is the opposite of the order of functions in c.layers.

Original:

outdims(c::Chain, isize) = foldl(∘, map(l -> (x -> outdims(l, x)), c.layers))(isize)

A correct implementation would be.

outdims(c::Chain, isize) = foldl(∘, map(l -> (x -> outdims(l, x)), reverse(c.layers)))(isize)

@hhaensel
Copy link
Contributor

hhaensel commented Jun 26, 2020

If we used

outdims(t::Tuple, isize) = outdims(last(t), outdims(first(t), isize))

we would simply ignore all layers in between. Moreover, the dimension check

isize == (size(l.W, 2),) || throw(DimensionMismatch("input size should equal to $((size(l.W)[2],)), got $isize"))

would possibly fail.

So my proposal would be:

# Dense layer
function outdims(l::Dense, isize::Tuple)
    first(isize) == size(l.W, 2) || throw(DimensionMismatch("input size should equal to ($(size(l.W, 2)), ...), got $isize"))
    return (size(l.W, 1), isize[2:end]...)
end

outdims(c::Chain, isize) = foldl(∘, map(l -> (x -> outdims(l, x)), reverse(c.layers)))(isize)

This would then also cover cases where more than one dimension is returned, as proposed by @darsnack, and it would check the full model for dimension compatibility.

@hhaensel
Copy link
Contributor

Just re-read the initial post by @HamletWantToCode and therefore think we should use foldr instead of foldl. This doesn't really matter, as function composition is associative, but it is perhaps more intuitive ...

@darsnack
Copy link
Member

we would simply ignore all layers in between

I think that implementation was supposed to be

outdims(t::Tuple, isize) = outdims(tail(t), outdims(first(t), isize))

@hhaensel
Copy link
Contributor

Ah, that makes sense! I already wondered why last should be imported.
I probably still favour the iterative version over the recursive one.

@hhaensel
Copy link
Contributor

hhaensel commented Jun 26, 2020

Played a bit and found that there is only a small difference between the two versions, but everything needed slight tuning:

import Base: tail

# Dense layer
function outdims(l::Dense, isize::Tuple)
    first(isize) == size(l.W, 2) || throw(DimensionMismatch("input size should equal to ($(size(l.W, 2)), ...), got $isize"))
    return (size(l.W, 1), Base.tail(isize)...)
end

# iterative version 
outdims(c::Chain, isize) = foldr(outdims, reverse(c.layers), init = isize)

# recursive version
outdims(t::Tuple, isize) = length(t) == 1 ? outdims(first(t), isize) : outdims(Base.tail(t), outdims(first(t), isize))
outdims(c::Chain, isize) = outdims(c.layers, isize)
  • using Base.tail(isize)... instead of isize[2:end]... boosted the performance from 4 µs to 100 ns.
  • using foldr with init instead of first creating functions and then folding went down to 60 ns.
  • the recursive version needed an abort criterium and is - with 65 ns - only 10% slower.

@DhairyaLGandhi
Copy link
Member

Hmm, that's interesting. I would take performance improvements here for sure 😄

@hhaensel
Copy link
Contributor

Shall I open a PR, or is anyone else already preparing one?

@darsnack
Copy link
Member

Yeah maybe we can address the side issue of how outdims in another PR. We should merge a PR that fixes the Chain issue so outdims is actually usable.

@hhaensel
Copy link
Contributor

So I'll file two PRs tonight ...

@hhaensel
Copy link
Contributor

as promised ...

bors bot added a commit that referenced this issue Jun 30, 2020
1252: outdims: revise implementation for Chain, dimension check for Dense r=CarloLucibello a=hhaensel

This PR reflects the discussion in #1086.
`outdims(c::Chain, isize)` calculated the layers in the wrong order.
The function has been replaced by a performance optimised version following the same idea.

`outdims(c::Dense, isize)` now throws an error if dimensions do not match.

One test, which now throws an error, has been adapted, more tests have been added.

I will setup another PR for further improvements of outdims, as discussed in the corresponding issue.

Co-authored-by: hhaensel <[email protected]>
@darsnack darsnack mentioned this issue Aug 5, 2020
4 tasks
@CarloLucibello CarloLucibello linked a pull request Dec 28, 2020 that will close this issue
4 tasks
@bors bors bot closed this as completed in #1305 Dec 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants