Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Q] Why is there "resize_bilinear" for "image_level_features"? (I think it does nothing..) #37

Open
ywpkwon opened this issue Jan 15, 2020 · 2 comments

Comments

@ywpkwon
Copy link

ywpkwon commented Jan 15, 2020

In the function "atrous_spatial_pyramid_pooling", (line 21, deeplab_model.py)

There is "image_level_features" (line 54--61)

        # (b) the image-level features
        with tf.variable_scope("image_level_features"):
          # global average pooling
          image_level_features = tf.reduce_mean(inputs, [1, 2], name='global_average_pooling', keepdims=True)
          # 1x1 convolution with 256 filters( and batch normalization)
          image_level_features = layers_lib.conv2d(image_level_features, depth, [1, 1], stride=1, scope='conv_1x1')
          # bilinearly upsample features
          image_level_features = tf.image.resize_bilinear(image_level_features, inputs_size, name='upsample')

I think "image_level_features" is same size as "inputs", since it is just a reduce_mean with keepdims.
Also, input_size = tf.shape(inputs[1:3]).

=> Then they are the same size, and why one should do the tf.image.resize_bilinear(image_level_features, inputs_size)?

@haydengunraj
Copy link

As the comments explain the reduce_mean call performs global average pooling across dimensions 1 (height) and 2 (width). This results in a feature map with size Nx1x1xC, which is then passed to a 1x1 conv (no shape change). As such, the tf.image.resize_bilinear call is used to upsample the feature dimensions to match the input dimensions so that they can be concatenated.

@ywpkwon
Copy link
Author

ywpkwon commented Jan 24, 2020

@haydengunraj , Thanks for explanation. I was mistakenly confused with the keepdims. So, if tf.image.resize_biliear converts A=Nx1x1xC to B=NxHxWxC, isn't the all HxW values are same with the 1x1 number (channel-wisely)? For example, A[n, 0, 0, c] == B[n, :, :, c] for any n and c?

Then, isn't it the same with the tf.tile in this case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants