-
-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
outdims function doesn't work properly for chained layers #1086
Comments
Sounds like a good idea to collect all the failing test cases, having something like |
Can it be taught to understand BTW why is |
Hi @mcabbott
Current implementation of outdims(l::Dense, isize) = (size(l.W)[1],) I think this can be solved by function outdims(l::Dense, isize::Tuple)
isize == (size(l.W, 2),) || throw(DimensionMismatch("input size should equal to $((size(l.W)[2],)), got $isize"))
return (size(l.W, 1),)
end I prefer to throw an error instead of return a 2-tuple. |
OK, so julia> Conv((3, 3), 3 => 16)(ones(10, 10, 3, 1)) |> size
(8, 8, 16, 1)
julia> Flux.outdims(Conv((3, 3), 3 => 16), (10, 10, 3, 1))
(8, 8)
julia> Flux.outdims(Conv((3, 3), 3 => 16), (10, 10)) # so it silently ignores 3rd & 4th?
(8, 8)
julia> Flux.outdims(Conv((3, 3), 3 => 16), (10, 10, 10, 1)) # no it doesn't!
ERROR: DimensionMismatch("Input channels must match! (10 vs. 3)")
julia> Flux.outdims(Dense(8, 16), (8,8,16,1)) # but here it ignores everything...
(16,) Perhaps this is orthogonal to your question, sorry, but at very least I think the docstring should explain what this function is for, "output dimensions" is pretty vague. Why it isn't just |
Hi @mcabbott
I also think so.
For the whole network, I think this may results in large overhead, since every time one need to run the complete computation for it's output dimension. However, I think this may be a good idea for functions like outdims(f::typeof(reshape), s::Tuple) = size(f(ones(Float32, s...))) since this won't cost much. |
Or just return the shape the reshape is wants to reshape it to? Accounting for any Colons, of course |
I don't think What I meant about julia> m = Chain(Conv((3, 3), 3 => 16), Base.Fix2(reshape, (:,4)));
julia> m(ones(10,10,3,1)) |> size
(256, 4)
julia> m(ones(10,10,3,2)) |> size
(512, 4) |
Yes, you are right. The more I think about it, the more I feel for this to work generically and correctly, it would need to do a true forward pass, which sounds like too much ado for something one expects to be trivial and quick. Writing all these specific rules feels incorrect too and won't scale. One could always have something that does a true forwards pass only as a fallback and only return the size of the output to the next layer, but that feels inelegant too. |
Yeah, this is an inherent problem with this kind of API. We definitely don't want to go down the route of supporting things like Probably best to recommend that people just make an array and throw it at the model to see what happens, which is the only really reliable source of truth. |
Having a slowish fallback plus fast shortcuts doesn't seem that inelegant to me. But perhaps I'm not so sure what this is for anyway. Is it for interactive use to check you've defined the model how you think you did? If so, is the speed of simply running it once a big deal? The present design seems pretty confusing though, it does not seem clearly specified what you should provide, or are going to get back. |
I think either the use case for this needs to be something more concrete where a forwards pass is untenable, or it's implantation needs to be correct, simple and reliable. Perhaps good to clear that bit first? |
The original PR was supposed to address programmatic layer building for large NNs. If you are putting together a series of convolutions, then you need to keep track of the (width, height) of each layer's output so that you can size the final dense layers properly. Based on Slack chatter, it was clear people kept defining this in their own code so it made sense to provide a utility function in Flux.
The only practical use case originally envisioned is what I highlighted above. Thus, it only needs to be there for conv-type layers. When doing the original PR, we thought why don't we provide it for other layers too to be consistent. That's why dense, etc. were added. To address the input-output relationship. The intent is that you put in (width, height) and get out (width, height). Probably it is a better design to return the full tensor size. As for the errors, we should be returning errors in all dimension mismatch cases. The reason it does for convolutions and not for dense is because the convolution layers utilize the underlying NNlib functions which throw a dimension mismatch.
Since the utility of these functions is during model building, we want fast responses. |
Also, based on the utility described above, I would argue that this function should be restricted to NN layers. I would suggest keeping it defined to dense/conv functions only, and defining a fallback method that determines the output size by evaluating the function. And we should be explicit in the docs that this is what happens for all types not listed in the docs. I am not sure evaluating Of course, we would include the consistency fixes to the API such as returning the full output tensor size and throwing errors. |
OK, sounds sensible. If it's going to be defined for both Perhaps for cases where you'd like to think only about height/width, not all dimensions, it should do something like |
Okay I see now that the docstring is broken for the original PR. I'm sorry, that was my first PR involving docs and I must have messed something up. The docstring actually addresses a lot of what we've been talking about. For example, the convolution docstring says:
And in the case of the dense, it has an example explicitly showing that the input tuple is effectively ignored even in the case of a dimension mismatch. That being said, I think the API changes here are better anyways. |
I think it would be fine to ignore the batch dimension if it's not present (no It'd also be fine to add a |
While propagating only the first two dimensions seems fine for |
For cases like that it would probably make sense for For the batch dimension we can just follow Julia's regular semantics, i.e. assume it's equal to |
OK, just summarize what have been discussed:
Here is what I think it could be: # fallback
outdims(f::Function, isize::Tuple) = size(f(ones(Float32, isize..., 1)))[1:end-1] # since we aren't care about batch dimension, we are free to just set it to 1
# Dense layer
function outdims(l::Dense, isize::Tuple)
isize == (size(l.W, 2),) || throw(DimensionMismatch("input size should equal to $((size(l.W)[2],)), got $isize"))
return (size(l.W, 1),)
end
# Conv layer
outdims(::typeof(Conv), isize) = ...
# Chain
using Base: last
outdims(t::Tuple, isize) = outdims(last(t), outdims(first(t), isize))
outdims(c::Chain, isize) = outdims(c.layers, isize) |
About reshape, yes I guess the example we ought to have had in this thread is I still think it would be easier for this thing to always treat the actual sizes, including batch dimensions of 1 if necessary, rather than have to explain why it deals with 3-tuples for layers that demand 4-tensors. Perhaps |
I think I guess the best way to describe it is that the batch dimension is always "pass-through." Defaults to 1 and preserved as-is if user-specified. |
The right way to fix this is to make the conv layers better at accepting input that doesn't have a trailing batch dimension. That they don't currently is really just the implementation leaking out, but it's easy to do. I do agree that the batch dimension shouldn't be a special case, though. |
I see there's a commented out line to allow Conv without a batch dim: Lines 56 to 57 in d4cf143
My GPU is self-isolating but is this still a problem? |
Someone just needs to figure out how to write it so that Julia's closure conversion is happy, which should be pretty easy to do. |
Stumbled across the outdims issue for chains today. Original:
A correct implementation would be.
|
If we used
we would simply ignore all layers in between. Moreover, the dimension check
would possibly fail. So my proposal would be:
This would then also cover cases where more than one dimension is returned, as proposed by @darsnack, and it would check the full model for dimension compatibility. |
Just re-read the initial post by @HamletWantToCode and therefore think we should use |
I think that implementation was supposed to be outdims(t::Tuple, isize) = outdims(tail(t), outdims(first(t), isize)) |
Ah, that makes sense! I already wondered why last should be imported. |
Played a bit and found that there is only a small difference between the two versions, but everything needed slight tuning:
|
Hmm, that's interesting. I would take performance improvements here for sure 😄 |
Shall I open a PR, or is anyone else already preparing one? |
Yeah maybe we can address the side issue of how |
So I'll file two PRs tonight ... |
as promised ... |
1252: outdims: revise implementation for Chain, dimension check for Dense r=CarloLucibello a=hhaensel This PR reflects the discussion in #1086. `outdims(c::Chain, isize)` calculated the layers in the wrong order. The function has been replaced by a performance optimised version following the same idea. `outdims(c::Dense, isize)` now throws an error if dimensions do not match. One test, which now throws an error, has been adapted, more tests have been added. I will setup another PR for further improvements of outdims, as discussed in the corresponding issue. Co-authored-by: hhaensel <[email protected]>
In PR #960
outdims
function is added to Flux.jl and makes it easier for us to infer the output dimension for our neural network model ( especially when using convolutional layer ). However, the results ofoutdims
function is incorrect for chained layers, and existing test cases fail to reflect this flaw. e.g. Try following code:Currently,
outdims
is implemented as:however,
foldl
doesn't give correct function composition, e.g. Try following code:I suggest to resolve this problem by the following modifications:
and add on following test cases:
Maybe we could also consider extend
outdims
function to more complex chained model, e.g. model that contain normal julia function, such asx->reshape(x, :, 4)
, and also writeoutdims
for recurrent layers.I'll collect these into a PR if you'd like to @MikeInnes @baggepinnen @darsnack .
The text was updated successfully, but these errors were encountered: