outdims function doesn't work properly for chained layers #1086

HamletWantToCode · 2020-03-16T07:25:51Z

In PR #960 outdims function is added to Flux.jl and makes it easier for us to infer the output dimension for our neural network model ( especially when using convolutional layer ). However, the results of outdims function is incorrect for chained layers, and existing test cases fail to reflect this flaw. e.g. Try following code:

using Flux
using Flux: outdims

all_dense_model = Chain(Dense(2, 10), Dense(10, 4))
outdims(all_dense_model, 2)    # will return (10,) while it should be (4,)

I'm using Flux v0.10.3 in Julia v1.3.0 under macOS

Currently, outdims is implemented as:

outdims(c::Chain, isize) = foldl(∘, map(l -> (x -> outdims(l, x)), c.layers))(isize)

however, foldl doesn't give correct function composition, e.g. Try following code:

f1(x) = 2x; f2(x) = exp(x); f3(x) = x^3
const x0 = 0.5
# we want to compute f3∘(f2∘f1(x))
f_true = f3∘f2∘f1
f_true(x0)   # 20.085536923187664
# however
f = foldl(∘, (f1, f2, f3))   # the returned f = ((f1∘f2)∘f3) following left association rule !!!
f(x0) # 2.2662969061336526

I suggest to resolve this problem by the following modifications:

# edit
using Base: tail # thanks to @darsnack for pointing this out

outdims(t::Tuple, isize) = outdims(tail(t), outdims(first(t), isize))
outdims(c::Chain, isize) = outdims(c.layers, isize)

and add on following test cases:

@testset "dense layer" begin
         X = randn(Float32, 5, 10)
         D0, D1, D2 = 5, 100, 25
         dense1 = Dense(D0, D1, relu)
         dense2 = Dense(D1, D2)
         dense_chain = Chain(dense1, dense2)
         @test outdims(dense1, D0) = (D1,)
         @test first(outdims(dense1, D0)) == size(dense1(X), 1)
         @test first(outdims(dense_chain, D0)) == size(dense_chain(X), 1)
end

@testset "conv layer" begin
            X = randn(Float32, 28, 28, 1, 1)
            D0, D1 = 3, 5
            S, P = 3, 1
            conv1_stride = Conv((D0, D0), 16=>32, stride=S, pad=P)
            conv2 = Conv((D1, D1), 3=>16)
            conv_chain = Chain(conv1_stride, conv2)
            @test typeof(outdims(conv1_stride, (28, 28))) <: Tuple
            @test length(outdims(conv1_stride, (28, 28))) == 2
            @test outdims(conv1_stride, (28, 28)) == size(conv1_stride(X))[1:2]
            @test outdims(conv_chain, (28, 28)) == size(conv_chain(X))[1:2]
end

Maybe we could also consider extend outdims function to more complex chained model, e.g. model that contain normal julia function, such as x->reshape(x, :, 4) , and also write outdims for recurrent layers.

I'll collect these into a PR if you'd like to @MikeInnes @baggepinnen @darsnack .

The text was updated successfully, but these errors were encountered:

DhairyaLGandhi · 2020-03-16T07:30:02Z

Sounds like a good idea to collect all the failing test cases, having something like outdims(::type of(reshape), ...) might be fine too

mcabbott · 2020-03-16T23:01:34Z

Can it be taught to understand x->reshape(x, :, 4)? Perhaps people would have to write Base.Fix2(reshape, (:,4)). Right now Flux.outdims(x -> reshape(x,:,4), 5) is an error, which is good, but perhaps a more helpful one would be better.

BTW why is Flux.outdims(Dense(2,5), (3,10)) == (5,)? I would have expect a 2-tuple, and probably an error!

HamletWantToCode · 2020-03-17T05:13:39Z

Hi @mcabbott

why is Flux.outdims(Dense(2,5), (3,10)) == (5,)

Current implementation of outdims for Dense layer ( see below ) doesn't really depend on the 2nd argument you passed in, it will just ignore (3, 10) and return size(l.W)[1].

outdims(l::Dense, isize) = (size(l.W)[1],)

I think this can be solved by

function outdims(l::Dense, isize::Tuple)
    isize == (size(l.W, 2),) || throw(DimensionMismatch("input size should equal to $((size(l.W)[2],)), got $isize"))
    return (size(l.W, 1),)
end

I prefer to throw an error instead of return a 2-tuple.

mcabbott · 2020-03-17T11:57:01Z

OK, so outdims does not propagate the size of tensors, it only cares about the first one or two dimensions, depending on the layer. Except that sometimes it checks the others?

julia> Conv((3, 3), 3 => 16)(ones(10, 10, 3, 1)) |> size
(8, 8, 16, 1)

julia> Flux.outdims(Conv((3, 3), 3 => 16), (10, 10, 3, 1))
(8, 8)

julia> Flux.outdims(Conv((3, 3), 3 => 16), (10, 10)) # so it silently ignores 3rd & 4th?
(8, 8)

julia> Flux.outdims(Conv((3, 3), 3 => 16), (10, 10, 10, 1)) # no it doesn't!
ERROR: DimensionMismatch("Input channels must match! (10 vs. 3)")

julia> Flux.outdims(Dense(8, 16), (8,8,16,1)) # but here it ignores everything...
(16,)

Perhaps this is orthogonal to your question, sorry, but at very least I think the docstring should explain what this function is for, "output dimensions" is pretty vague. Why it isn't just outsize(f, s::Tuple) = size(f(ones(Float32, s...))), or faster versions of that? Then x->reshape(x, :, 4) would just work, whereas at present the output seems to depend on things that the layer before will have dropped.

HamletWantToCode · 2020-03-17T13:10:58Z

Hi @mcabbott

it only cares about the first one or two dimensions, depending on the layer. Except that sometimes it checks the others?

I also think so.

Why it isn't just outsize(f, s::Tuple) = size(f(ones(Float32, s...)))

For the whole network, I think this may results in large overhead, since every time one need to run the complete computation for it's output dimension. However, I think this may be a good idea for functions like x->reshape(x, :, 4), we can just define something like:

outdims(f::typeof(reshape), s::Tuple) = size(f(ones(Float32, s...)))

since this won't cost much.

DhairyaLGandhi · 2020-03-17T13:13:07Z

Or just return the shape the reshape is wants to reshape it to? Accounting for any Colons, of course

mcabbott · 2020-03-17T13:18:57Z

I don't think x->reshape(x, :, 4) has a type you can recognise, hence my Base.Fix2 comment above.

What I meant about outsize was that this would be a simple rule for what the function should return, and a fallback implementation, even if it gets skipped on known types such as Conv. But this rule is quite different to what outdims presently returns. Its action on Conv means that it cannot distinguish these two cases:

julia> m = Chain(Conv((3, 3), 3 => 16), Base.Fix2(reshape, (:,4)));

julia> m(ones(10,10,3,1)) |> size
(256, 4)

julia> m(ones(10,10,3,2)) |> size
(512, 4)

DhairyaLGandhi · 2020-03-17T13:26:09Z

Yes, you are right. The more I think about it, the more I feel for this to work generically and correctly, it would need to do a true forward pass, which sounds like too much ado for something one expects to be trivial and quick. Writing all these specific rules feels incorrect too and won't scale.

One could always have something that does a true forwards pass only as a fallback and only return the size of the output to the next layer, but that feels inelegant too.

MikeInnes · 2020-03-17T13:29:11Z

Yeah, this is an inherent problem with this kind of API. We definitely don't want to go down the route of supporting things like x -> reshape(x, ...) and so on.

Probably best to recommend that people just make an array and throw it at the model to see what happens, which is the only really reliable source of truth.

mcabbott · 2020-03-17T13:31:14Z

Having a slowish fallback plus fast shortcuts doesn't seem that inelegant to me. But perhaps I'm not so sure what this is for anyway. Is it for interactive use to check you've defined the model how you think you did? If so, is the speed of simply running it once a big deal?

The present design seems pretty confusing though, it does not seem clearly specified what you should provide, or are going to get back.

DhairyaLGandhi · 2020-03-17T13:49:19Z

I think either the use case for this needs to be something more concrete where a forwards pass is untenable, or it's implantation needs to be correct, simple and reliable. Perhaps good to clear that bit first?

darsnack · 2020-03-17T13:53:56Z

The original PR was supposed to address programmatic layer building for large NNs. If you are putting together a series of convolutions, then you need to keep track of the (width, height) of each layer's output so that you can size the final dense layers properly. Based on Slack chatter, it was clear people kept defining this in their own code so it made sense to provide a utility function in Flux.

But perhaps I'm not so sure what this is for anyway.

The only practical use case originally envisioned is what I highlighted above. Thus, it only needs to be there for conv-type layers. When doing the original PR, we thought why don't we provide it for other layers too to be consistent. That's why dense, etc. were added.

To address the input-output relationship. The intent is that you put in (width, height) and get out (width, height). Probably it is a better design to return the full tensor size.

As for the errors, we should be returning errors in all dimension mismatch cases. The reason it does for convolutions and not for dense is because the convolution layers utilize the underlying NNlib functions which throw a dimension mismatch.

Probably best to recommend that people just make an array and throw it at the model to see what happens, which is the only really reliable source of truth.

Since the utility of these functions is during model building, we want fast responses.

darsnack · 2020-03-17T14:01:09Z

Also, based on the utility described above, I would argue that this function should be restricted to NN layers. I would suggest keeping it defined to dense/conv functions only, and defining a fallback method that determines the output size by evaluating the function. And we should be explicit in the docs that this is what happens for all types not listed in the docs. I am not sure evaluating reshape is considered slow compared to evaluating a series of convolution layers.

Of course, we would include the consistency fixes to the API such as returning the full output tensor size and throwing errors.

mcabbott · 2020-03-17T14:07:28Z

OK, sounds sensible. If it's going to be defined for both Conv and Dense, then it has to have some way of digesting reshape as you must be mixing 4-tensors and 2-tensors.

Perhaps for cases where you'd like to think only about height/width, not all dimensions, it should do something like outdims(Conv((3, 3), 3 => 16), (10, 10)) == (8, 8, missing, missing).

darsnack · 2020-03-17T14:12:11Z

Okay I see now that the docstring is broken for the original PR. I'm sorry, that was my first PR involving docs and I must have messed something up.

The docstring actually addresses a lot of what we've been talking about. For example, the convolution docstring says:

Calculate the output dimensions given the input dimensions, isize.
Batch size and channel size are ignored as per NNlib.jl.

And in the case of the dense, it has an example explicitly showing that the input tuple is effectively ignored even in the case of a dimension mismatch.

That being said, I think the API changes here are better anyways.

MikeInnes · 2020-03-17T14:19:00Z

I think it would be fine to ignore the batch dimension if it's not present (no missing) and check it if it is. The use case for outdims sounds reasonable so let's keep it to simple layers but also clearly document a pattern for more complex models (i.e. constructing and passing an array of nonsense values).

It'd also be fine to add a Reshape layer for reshaping, which outdims could be aware of (in a separate PR).

mcabbott · 2020-03-17T14:32:23Z

While propagating only the first two dimensions seems fine for Chain(Conv, MaxPool, Conv) (although it should be clearly documented that isize does not necessarily mean size, etc.), it's likely to be ambiguous for Chain(Conv, Reshape, Dense) if the reshape could really accept a matrix too. With missing it could warn you. Otherwise it could be restricted only to layers that take exclusively 4-tensors, which perhaps was the original idea.

MikeInnes · 2020-03-17T14:40:06Z

For cases like that it would probably make sense for Conv to always return the number of channels at least (which is easy if it is not provided, since it's fixed); and that extra output can easily be ignored if it's not needed.

For the batch dimension we can just follow Julia's regular semantics, i.e. assume it's equal to 1 if it's needed. If your Reshape is not generic over batch dimension you'll have to deal with that the same way you would with a regular array.

HamletWantToCode · 2020-03-17T14:41:42Z

OK, just summarize what have been discussed:

The outdims function should be able to return output dimension ( except for batch dimension ) of a neural network model.
We don't expect to actually run the whole model when we use outdims.
The outdims function should be specifically defined for Flux's layers, such as Dense & Conv, while allow a fallback for others ( such as reshape ).

Here is what I think it could be:

# fallback
outdims(f::Function, isize::Tuple) = size(f(ones(Float32, isize..., 1)))[1:end-1]    # since we aren't care about batch dimension, we are free to just set it to 1

# Dense layer
function outdims(l::Dense, isize::Tuple)
    isize == (size(l.W, 2),) || throw(DimensionMismatch("input size should equal to $((size(l.W)[2],)), got $isize"))
    return (size(l.W, 1),)
end

# Conv layer
outdims(::typeof(Conv), isize) = ...

# Chain
using Base: last

outdims(t::Tuple, isize) = outdims(last(t), outdims(first(t), isize))
outdims(c::Chain, isize) = outdims(c.layers, isize)

mcabbott · 2020-03-17T14:53:38Z

About reshape, yes I guess the example we ought to have had in this thread is x -> reshape(x, :, size(x,4)) not (x, :, 4).

I still think it would be easier for this thing to always treat the actual sizes, including batch dimensions of 1 if necessary, rather than have to explain why it deals with 3-tuples for layers that demand 4-tensors. Perhaps outdims(Conv((3, 3), 3 => 16), (10, 10)) == (8, 8, 16, 1), allowing you to run what you had before.

darsnack · 2020-03-17T14:59:44Z

I think isize should include all the dimensions that would be there as input to that layer with the exception that batch size can be left off. In this case, we can default to 1. If we assume 1, then that propagates down through the chain nicely. And remembering the model building use-case, we don't want users to have to specify batch since the model architecture is independent of batch size.

I guess the best way to describe it is that the batch dimension is always "pass-through." Defaults to 1 and preserved as-is if user-specified.

MikeInnes · 2020-03-17T15:23:49Z

rather than have to explain why it deals with 3-tuples for layers that demand 4-tensors

The right way to fix this is to make the conv layers better at accepting input that doesn't have a trailing batch dimension. That they don't currently is really just the implementation leaking out, but it's easy to do.

I do agree that the batch dimension shouldn't be a special case, though. outdim should behave pretty much exactly like size(model(rand(in...))) or it's going to be difficult for layer authors to implement it correctly. That definition should support the "pass-through" semantics well because the layers already do that anyway.

mcabbott · 2020-03-17T15:45:04Z

I see there's a commented out line to allow Conv without a batch dim:

Flux.jl/src/layers/conv.jl

Lines 56 to 57 in d4cf143

    
           # TODO: breaks gpu broadcast :( 
        
           # ndims(x) == ndims(c.weight)-1 && return squeezebatch(c(reshape(x, size(x)..., 1)))

My GPU is self-isolating but is this still a problem?

MikeInnes · 2020-03-17T15:46:36Z

Someone just needs to figure out how to write it so that Julia's closure conversion is happy, which should be pretty easy to do.

hhaensel · 2020-06-26T09:01:36Z

Stumbled across the outdims issue for chains today.
The reason why the current implementation fails is that the order of function composition ∘ is the opposite of the order of functions in c.layers.

Original:

outdims(c::Chain, isize) = foldl(∘, map(l -> (x -> outdims(l, x)), c.layers))(isize)

A correct implementation would be.

outdims(c::Chain, isize) = foldl(∘, map(l -> (x -> outdims(l, x)), reverse(c.layers)))(isize)

hhaensel · 2020-06-26T09:45:06Z

If we used

outdims(t::Tuple, isize) = outdims(last(t), outdims(first(t), isize))

we would simply ignore all layers in between. Moreover, the dimension check

isize == (size(l.W, 2),) || throw(DimensionMismatch("input size should equal to $((size(l.W)[2],)), got $isize"))

would possibly fail.

So my proposal would be:

# Dense layer
function outdims(l::Dense, isize::Tuple)
    first(isize) == size(l.W, 2) || throw(DimensionMismatch("input size should equal to ($(size(l.W, 2)), ...), got $isize"))
    return (size(l.W, 1), isize[2:end]...)
end

outdims(c::Chain, isize) = foldl(∘, map(l -> (x -> outdims(l, x)), reverse(c.layers)))(isize)

This would then also cover cases where more than one dimension is returned, as proposed by @darsnack, and it would check the full model for dimension compatibility.

hhaensel · 2020-06-26T11:02:01Z

Just re-read the initial post by @HamletWantToCode and therefore think we should use foldr instead of foldl. This doesn't really matter, as function composition is associative, but it is perhaps more intuitive ...

darsnack · 2020-06-26T12:29:11Z

we would simply ignore all layers in between

I think that implementation was supposed to be

outdims(t::Tuple, isize) = outdims(tail(t), outdims(first(t), isize))

hhaensel · 2020-06-26T13:31:24Z

Ah, that makes sense! I already wondered why last should be imported.
I probably still favour the iterative version over the recursive one.

hhaensel · 2020-06-26T17:13:00Z

Played a bit and found that there is only a small difference between the two versions, but everything needed slight tuning:

import Base: tail

# Dense layer
function outdims(l::Dense, isize::Tuple)
    first(isize) == size(l.W, 2) || throw(DimensionMismatch("input size should equal to ($(size(l.W, 2)), ...), got $isize"))
    return (size(l.W, 1), Base.tail(isize)...)
end

# iterative version 
outdims(c::Chain, isize) = foldr(outdims, reverse(c.layers), init = isize)

# recursive version
outdims(t::Tuple, isize) = length(t) == 1 ? outdims(first(t), isize) : outdims(Base.tail(t), outdims(first(t), isize))
outdims(c::Chain, isize) = outdims(c.layers, isize)

using Base.tail(isize)... instead of isize[2:end]... boosted the performance from 4 µs to 100 ns.
using foldr with init instead of first creating functions and then folding went down to 60 ns.
the recursive version needed an abort criterium and is - with 65 ns - only 10% slower.

DhairyaLGandhi · 2020-06-27T07:58:17Z

Hmm, that's interesting. I would take performance improvements here for sure 😄

hhaensel · 2020-06-27T09:34:24Z

Shall I open a PR, or is anyone else already preparing one?

darsnack · 2020-06-27T12:01:55Z

Yeah maybe we can address the side issue of how outdims in another PR. We should merge a PR that fixes the Chain issue so outdims is actually usable.

hhaensel · 2020-06-27T12:15:29Z

So I'll file two PRs tonight ...

hhaensel · 2020-06-27T23:55:43Z

as promised ...

1252: outdims: revise implementation for Chain, dimension check for Dense r=CarloLucibello a=hhaensel This PR reflects the discussion in #1086. `outdims(c::Chain, isize)` calculated the layers in the wrong order. The function has been replaced by a performance optimised version following the same idea. `outdims(c::Dense, isize)` now throws an error if dimensions do not match. One test, which now throws an error, has been adapted, more tests have been added. I will setup another PR for further improvements of outdims, as discussed in the corresponding issue. Co-authored-by: hhaensel <[email protected]>

MikeInnes linked a pull request Mar 17, 2020 that will close this issue

WIP: Make optimize work on structs #1073

Closed

This was referenced Jun 27, 2020

outdims: revise implementation for Chain, dimension check for Dense #1252

Merged

outdims: possible enhancements #1253

Closed

darsnack mentioned this issue Aug 5, 2020

Updates to outdims #1305

Merged

4 tasks

CarloLucibello linked a pull request Dec 28, 2020 that will close this issue

Updates to outdims #1305

Merged

4 tasks

bors bot closed this as completed in #1305 Dec 30, 2020

outdims function doesn't work properly for chained layers #1086

outdims function doesn't work properly for chained layers #1086

Comments

HamletWantToCode commented Mar 16, 2020 • edited Loading

DhairyaLGandhi commented Mar 16, 2020

mcabbott commented Mar 16, 2020

HamletWantToCode commented Mar 17, 2020 • edited Loading

mcabbott commented Mar 17, 2020

HamletWantToCode commented Mar 17, 2020 • edited Loading

DhairyaLGandhi commented Mar 17, 2020 • edited Loading

mcabbott commented Mar 17, 2020

DhairyaLGandhi commented Mar 17, 2020

MikeInnes commented Mar 17, 2020

mcabbott commented Mar 17, 2020 • edited Loading

DhairyaLGandhi commented Mar 17, 2020 • edited Loading

darsnack commented Mar 17, 2020 • edited Loading

darsnack commented Mar 17, 2020

mcabbott commented Mar 17, 2020

darsnack commented Mar 17, 2020

MikeInnes commented Mar 17, 2020

mcabbott commented Mar 17, 2020

MikeInnes commented Mar 17, 2020

HamletWantToCode commented Mar 17, 2020

mcabbott commented Mar 17, 2020

darsnack commented Mar 17, 2020

MikeInnes commented Mar 17, 2020 • edited Loading

mcabbott commented Mar 17, 2020

MikeInnes commented Mar 17, 2020

hhaensel commented Jun 26, 2020

hhaensel commented Jun 26, 2020 • edited Loading

hhaensel commented Jun 26, 2020

darsnack commented Jun 26, 2020

hhaensel commented Jun 26, 2020

hhaensel commented Jun 26, 2020 • edited Loading

DhairyaLGandhi commented Jun 27, 2020

hhaensel commented Jun 27, 2020

darsnack commented Jun 27, 2020

hhaensel commented Jun 27, 2020

hhaensel commented Jun 27, 2020

HamletWantToCode commented Mar 16, 2020 •

edited

Loading

HamletWantToCode commented Mar 17, 2020 •

edited

Loading

HamletWantToCode commented Mar 17, 2020 •

edited

Loading

DhairyaLGandhi commented Mar 17, 2020 •

edited

Loading

mcabbott commented Mar 17, 2020 •

edited

Loading

DhairyaLGandhi commented Mar 17, 2020 •

edited

Loading

darsnack commented Mar 17, 2020 •

edited

Loading

MikeInnes commented Mar 17, 2020 •

edited

Loading

hhaensel commented Jun 26, 2020 •

edited

Loading

hhaensel commented Jun 26, 2020 •

edited

Loading