Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error for running experiments #14

Open
yunhunJang opened this issue Dec 5, 2016 · 18 comments
Open

Error for running experiments #14

yunhunJang opened this issue Dec 5, 2016 · 18 comments

Comments

@yunhunJang
Copy link

I'm trying to running the experiments with MNIST.

I used the command PYTHONPATH='.' python launchers/run_mnist_exp.py

However, it gives me the error

ValueError: Variable d_net/conv_batch_norm/conv_batch_norm/conv_batch_norm_2/conv_batch_norm/conv_batch_norm/moments/normalize/mean/ExponentialMovingAverage/ does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?

It might be caused by the fact that I use recent version of Tensorflow master branch. I wonder which modification make running this code on current settings.

Thanks,

@mariolew
Copy link

mariolew commented Dec 6, 2016

Same issue, hope for help.

@coventry
Copy link

coventry commented Dec 7, 2016

Try setting

tensorflow==0.9.0
prettytensor==0.6.2

in requirements.txt. (Versions inferred from the chronology of the git histories.) Make sure you're also using the tensorflow/tensorflow:0.9.0-gpu image with nvidia-docker.

@pmiller10
Copy link

I hit the same issue. I switched to prettytensor==0.6.2 but still used tensorflow==0.12.0 and that seemed to solve it.

@yunhunJang
Copy link
Author

@pmiller10 I remove previous prettytensor==0.7.1 and re-install prettytensor==0.6.2 but it still does not work.. Did you run it with docker? Could you describe your settings in detail?

@pmiller10
Copy link

pmiller10 commented Dec 12, 2016

@yunhunJang My steps are:

  1. git clone [email protected]:openai/InfoGAN.git
  2. sudo docker run -v $(pwd)/InfoGAN:/InfoGAN -w /InfoGAN -it -p 8888:8888 gcr.io/tensorflow/tensorflow:r0.9rc0-devel
  3. confirm which versions of prettytensor and tensorflow you have:
    pip freeze | grep 'tensor'
    At this point, all I have is tensorflow==0.9.0rc0.
  4. edit requirements.txt: change prettytensor -> prettytensor==0.6.2
  5. pip install -r requirements.txt. After this, check again what versions you have. This is what I've got:
prettytensor==0.6.2
tensorflow==0.9.0rc0
  1. PYTHONPATH='.' python launchers/run_mnist_exp.py

@yunhunJang
Copy link
Author

So, you used tensorflow 0.9, right?
If I changed it to v0.9 it works well.
But still, I wonder how can I make it work in tensorflow v0.12 (and using prettytensor 0.7.2 which is the latest version)

It gives error in ExponentialMovingAverage operation in conv_batch_norm in custom_op.py.
The error message is following:

I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
Extracting MNIST/train-images-idx3-ubyte.gz
Extracting MNIST/train-labels-idx1-ubyte.gz
Extracting MNIST/t10k-images-idx3-ubyte.gz
Extracting MNIST/t10k-labels-idx1-ubyte.gz
batch_norm
g_net/fc_batch_norm
g_net/fc_batch_norm/batch_norm
batch_norm
g_net/fc_batch_norm_1
g_net/fc_batch_norm_1/batch_norm
conv_batch_norm
g_net/conv_batch_norm
g_net/conv_batch_norm/conv_batch_norm
custom_conv2d
d_net/custom_conv2d
d_net/custom_conv2d/custom_conv2d
custom_conv2d_1
d_net/custom_conv2d_1
d_net/custom_conv2d_1/custom_conv2d_1
conv_batch_norm
d_net/conv_batch_norm
d_net/conv_batch_norm/conv_batch_norm
batch_norm
d_net/fc_batch_norm
d_net/fc_batch_norm/batch_norm
custom_conv2d
d_net/custom_conv2d
d_net/custom_conv2d/custom_conv2d
custom_conv2d_1
d_net/custom_conv2d_1
d_net/custom_conv2d_1/custom_conv2d_1
conv_batch_norm
d_net/conv_batch_norm
d_net/conv_batch_norm/conv_batch_norm
Traceback (most recent call last):
  File "launchers/run_mnist_exp.py", line 65, in <module>
    algo.train()
  File "/home/yhoon/InfoGAN/infogan/algos/infogan_trainer.py", line 210, in train
    self.init_opt()
  File "/home/yhoon/InfoGAN/infogan/algos/infogan_trainer.py", line 53, in init_opt
    real_d, _, _, _ = self.model.discriminate(input_tensor)
  File "/home/yhoon/InfoGAN/infogan/models/regularized_gan.py", line 72, in discriminate
    reg_dist_flat = self.encoder_template.construct(input=x_var)
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1246, in construct
    return self._construct(context)
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1171, in _construct
    method_args = self._replace_deferred(self._method_args, context)
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1143, in _replace_deferred
    return [self._replace_deferred(x, context) for x in arg]
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1138, in _replace_deferred
    return arg._construct(context)
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1171, in _construct
    method_args = self._replace_deferred(self._method_args, context)
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1143, in _replace_deferred
    return [self._replace_deferred(x, context) for x in arg]
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1138, in _replace_deferred
    return arg._construct(context)
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1171, in _construct
    method_args = self._replace_deferred(self._method_args, context)
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1143, in _replace_deferred
    return [self._replace_deferred(x, context) for x in arg]
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1138, in _replace_deferred
    return arg._construct(context)
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1171, in _construct
    method_args = self._replace_deferred(self._method_args, context)
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1143, in _replace_deferred
    return [self._replace_deferred(x, context) for x in arg]
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1138, in _replace_deferred
    return arg._construct(context)
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1171, in _construct
    method_args = self._replace_deferred(self._method_args, context)
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1143, in _replace_deferred
    return [self._replace_deferred(x, context) for x in arg]
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1138, in _replace_deferred
    return arg._construct(context)
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1171, in _construct
    method_args = self._replace_deferred(self._method_args, context)
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1143, in _replace_deferred
    return [self._replace_deferred(x, context) for x in arg]
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1138, in _replace_deferred
    return arg._construct(context)
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1171, in _construct
    method_args = self._replace_deferred(self._method_args, context)
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1143, in _replace_deferred
    return [self._replace_deferred(x, context) for x in arg]
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1138, in _replace_deferred
    return arg._construct(context)
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1171, in _construct
    method_args = self._replace_deferred(self._method_args, context)
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1143, in _replace_deferred
    return [self._replace_deferred(x, context) for x in arg]
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1138, in _replace_deferred
    return arg._construct(context)
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1173, in _construct
    result = self._method(*method_args, **method_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/scopes.py", line 158, in __call__
    return self._call_func(args, kwargs)
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/scopes.py", line 131, in _call_func
    return self._func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/prettytensor/pretty_tensor_class.py", line 1922, in _with_method_complete
    return input_layer._method_complete(func(*args, **kwargs))
  File "/home/yhoon/InfoGAN/infogan/misc/custom_ops.py", line 27, in __call__
    self.ema_apply_op = self.ema.apply([self.mean, self.variance])
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/moving_averages.py", line 391, in apply
    self._averages[var], var, decay, zero_debias=zero_debias))
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/moving_averages.py", line 70, in assign_moving_average
    update_delta = _zero_debias(variable, value, decay)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/moving_averages.py", line 177, in _zero_debias
    trainable=False)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 1024, in get_variable
    custom_getter=custom_getter)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 850, in get_variable
    custom_getter=custom_getter)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 346, in get_variable
    validate_shape=validate_shape)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 331, in _true_getter
    caching_device=caching_device, validate_shape=validate_shape)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 650, in _get_single_variable
    "VarScope?" % name)
ValueError: Variable d_net/conv_batch_norm/conv_batch_norm/conv_batch_norm_2/conv_batch_norm/conv_batch_norm/conv_batch_norm_2/conv_batch_norm/conv_batch_norm/moments/normalize/mean/ExponentialMovingAverage/biased does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?

originally defined at:
  File "launchers/run_mnist_exp.py", line 48, in <module>
    network_type="mnist",
  File "/home/yhoon/InfoGAN/infogan/models/regularized_gan.py", line 37, in __init__
    custom_conv2d(128, k_h=4, k_w=4).

I added some print in custom_conv2d and conv_batch_norm like following..

...
    def __call__(self, input_layer, output_dim,
                 k_h=5, k_w=5, d_h=2, d_w=2, stddev=0.02, in_dim=None, padding='SAME',
                 name="conv2d"):
        print(name)
        print(tf.get_variable_scope().name)
        with tf.variable_scope(name):
            print(tf.get_variable_scope().name)
...
...
shp = in_dim or shape[-1]
        print(name)
        print(tf.get_variable_scope().name)
        with tf.variable_scope(name) as scope:
            print(tf.get_variable_scope().name)
            self.gamma = self.variable("gamma", [shp], init=tf.random_normal_initializer(1., 0.02))
...

Any hints would be nice. I'm new to tensorflow, so it is hard to get where to look at now.
( I tested with my simple NN using similar flow in tensorflow 0.12 and prettytensor 0.7.2, and it works okay. I think this custom batch_norm/conv makes some conflict with the latest version of tensorflow/prettytensor)

Thanks!

@tachim
Copy link

tachim commented Dec 15, 2016

This is due to tensorflow fixing a problem with EMA -- see VittalP/UnsupGAN#1 for a fix.

@yunhunJang
Copy link
Author

@tachim Thank you! It works well now! I really appreciate it.

@lyhangustc
Copy link

@yunhunJang I have read VittalP/UnsupGAN#1. But I do not know how to edit the code. What did you edit to get it work?

@yunhunJang
Copy link
Author

@lyhangustc I edit the line16 of infogan/misc/custom_ops.py from with tf.variable_scope(name) as scope: to with tf.variable_scope(tf.get_variable_scope(), reuse=False) as scope:

@lyhangustc
Copy link

@yunhunJang It works. Thank you!

@kaihuchen
Copy link

I also have problem running the MNIST experiment, but the symptom looks different from the above:

$ PYTHONPATH='.' python launchers/run_mnist_exp.py
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so.7.5 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so.4.0.7 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so.7.5 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so.7.5 locally
Extracting MNIST/train-images-idx3-ubyte.gz
Extracting MNIST/train-labels-idx1-ubyte.gz
Extracting MNIST/t10k-images-idx3-ubyte.gz
Extracting MNIST/t10k-labels-idx1-ubyte.gz
--Return--
None
> /mnt/ml/tests/InfoGAN/infogan/misc/custom_ops.py(121)__call__()
    117                                        init=tf.random_normal_initializer(stddev=stddev))
    118                 bias = self.variable("bias", [output_size], init=tf.constant_initializer(bias_start))
    119                 return input_layer.with_tensor(tf.matmul(input_, matrix) + bias, parameters=self.vars)
    120         except Exception:
--> 121             import ipdb; ipdb.set_trace()

ipdb>  init=tf.random_normal_initializer(stddev=stddev)
ipdb> init
<function _initializer at 0x7f7228067c08>
ipdb> stddev
0.02
ipdb>

Anybody can help? Thanks!

@frizfealer
Copy link

@kaihuchen I came with the same problem with you. I solve this error by 1) update tensorflow to version 0.12.1, 2) update the code by the solution provided by @yunhunJang.

@kaihuchen
Copy link

@frizfealer Got it. Thanks!

@tornadomeet
Copy link

solved this problem with suggest form @pmiller10 :
pip uninstall prettytensor
pip install prettytensor==0.6.2

@dugarsumit
Copy link

The fix that is being discussed above worked for tensorflow 1.0.1 but after I upgraded tensorflow to 1.2 I got the same error again. I tried a few version in between 1.2 and 1.0.1 but was still getting the same error

@tornadomeet
Copy link

just use tf version r0.9rc0-devel from readme: https://github.com/openai/InfoGAN#running-in-docker

@zjost
Copy link

zjost commented Sep 25, 2017

I forked this repo and made the changes to use tensorflow 1.3.0. You can find it here. But this diff shows the changes.

I also needed to change a part of prettytensor that involved unpacking the trace. Specifically, I modified prettytensor/pretty_tensor_class.py to add a try/except block and then instead of unpacking the trace by assuming 4 elements, I just assign the "f", "line_no", and "method" by indexing a tuple so that len(result._traceback) != 4 doesn't break it.

1337   try:
1338       for traceback in result._traceback:
1339         f = traceback[0]
1340         line_no = traceback[1]
1341         method = traceback[2]
1342         if (method in ('_replace_deferred', '_construct') and
1343             f.endswith('pretty_tensor_class.py')):
1344           found = True
1345           continue
1346         trace.append((f, line_no, method, {}))
1347       result._traceback = trace
1348   except:
1349       print("Traceback: ", result._traceback)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests