Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[es_distributed/tf_util.py] ValueError: Dimension 0 in both shapes must be equal, but are 4608 and 18. Shapes are [4608] and [18]. From merging shape 12 with other shapes. for 'concat/concat_dim' (op: 'Pack') with input shapes: [4096], [16], [16], [16], [8192], [32], [32], [32], [991232], [ 256], [256], [256], [4608], [18]. #31

Open
dragon28 opened this issue Jun 19, 2019 · 1 comment

Comments

@dragon28
Copy link

dragon28 commented Jun 19, 2019

Hello People,

I managed to find some error, when I tested the ES algorithm.

python3 -m es_distributed.main master --master_socket_path /tmp/es_redis_master.sock --algo es --exp_file configurations/frostbite_es.json                       [67/67]
file configurations/frostbite_es.json /tmp/es_redis_master.sock --algo es --exp_f                                                                                       
[2019-06-04 23:15:37,056 pid=22170] run_master: {'exp': {'config': {'calc_obstat_prob': 0.0, 'episodes_per_batch': 5000, 'eval_prob': 0.01, 'l2coeff': 0.005, 'noise_std
ev': 0.005, 'snapshot_freq': 20, 'timesteps_per_batch': 10000, 'return_proc_mode': 'centered_rank', 'episode_cutoff_mode': 5000}, 'env_id': 'FrostbiteNoFrameskip-v4', '
optimizer': {'args': {'stepsize': 0.01}, 'type': 'adam'}, 'policy': {'args': {}, 'type': 'ESAtariPolicy'}}, 'log_dir': '/tmp/es_master_22170', 'master_redis_cfg': {'uni
x_socket_path': '/tmp/es_redis_master.sock'}}                                                                                                                           
[2019-06-04 23:15:38,083 pid=22170] Tabular logging to /tmp/es_master_22170                                                                                             
2019-06-04 23:15:38.894940: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3410300000 Hz                                                      
2019-06-04 23:15:38.895374: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x40bfa50 executing computations on platform Host. Devices:                   
2019-06-04 23:15:38.895393: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>                                     
[2019-06-04 23:15:38,904 pid=22170] From /home/dragon/.local/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorfl
ow.python.framework.ops) is deprecated and will be removed in a future version.                                                                                         
Instructions for updating:                                                                                                                                              
Colocations handled automatically by placer.                                                                                                                            
[2019-06-04 23:15:38,991 pid=22170] From /home/dragon/.local/lib/python3.6/site-packages/tensorflow/contrib/layers/python/layers/layers.py:1624: flatten (from tensorflo
w.python.layers.core) is deprecated and will be removed in a future version.                                                                                            
Instructions for updating:                                                                                                                                              
Use keras.layers.flatten instead.                                                                                                                                       
[2019-06-04 23:15:39,054 pid=22170] From /home/dragon/.local/lib/python3.6/site-packages/tensorflow/python/util/decorator_utils.py:145: GraphKeys.VARIABLES (from tensor
flow.python.framework.ops) is deprecated and will be removed in a future version.                                                                                       
Instructions for updating:                                                                                                                                              
Use `tf.GraphKeys.GLOBAL_VARIABLES` instead.
Traceback (most recent call last):                                                                                                                               [45/67]
  File "/home/dragon/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1659, in _create_c_op
    c_op = c_api.TF_FinishOperation(op_desc)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Dimension 0 in both shapes must be equal, but are 4608 and 18. Shapes are [4608] and [18].
        From merging shape 12 with other shapes. for 'concat/concat_dim' (op: 'Pack') with input shapes: [4096], [16], [16], [16], [8192], [32], [32], [32], [991232], [
256], [256], [256], [4608], [18].

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/dragon/machine_learning/deep-neuroevolution/es_distributed/main.py", line 90, in <module>
    cli()
  File "/home/dragon/.local/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/dragon/.local/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/dragon/.local/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/dragon/.local/lib/python3.6/site-packages/click/core.py", line 956, in invoke                                                                      [23/67]
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/dragon/.local/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/dragon/machine_learning/deep-neuroevolution/es_distributed/main.py", line 61, in master
    algo.run_master({'unix_socket_path': master_socket_path}, log_dir, exp)
  File "/home/dragon/machine_learning/deep-neuroevolution/es_distributed/es.py", line 147, in run_master
    config, env, sess, policy = setup(exp, single_threaded=False)
  File "/home/dragon/machine_learning/deep-neuroevolution/es_distributed/es.py", line 136, in setup
    policy = getattr(policies, exp['policy']['type'])(env.observation_space, env.action_space, **exp['policy']['args'])
  File "/home/dragon/machine_learning/deep-neuroevolution/es_distributed/policies.py", line 24, in __init__
    self._getflat = U.GetFlat(self.trainable_variables)
  File "/home/dragon/machine_learning/deep-neuroevolution/es_distributed/tf_util.py", line 244, in __init__ 
    self.op = tf.concat(0, [tf.reshape(v, [numel(v)]) for v in var_list])                                     
  File "/home/dragon/.local/lib/python3.6/site-packages/tensorflow/python/util/dispatch.py", line 180, in wrapper
    return target(*args, **kwargs)
  File "/home/dragon/.local/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 1253, in concat
    dtype=dtypes.int32).get_shape().assert_is_compatible_with(
  File "/home/dragon/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1039, in convert_to_tensor
    return convert_to_tensor_v2(value, dtype, preferred_dtype, name)
  File "/home/dragon/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1097, in convert_to_tensor_v2
    as_ref=False)
File "/home/dragon/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1175, in internal_convert_to_tensor                                   
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/home/dragon/.local/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 1102, in _autopacking_conversion_function
    return _autopacking_helper(v, dtype, name or "packed")
  File "/home/dragon/.local/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 1054, in _autopacking_helper
    return gen_array_ops.pack(elems_as_tensors, name=scope)
  File "/home/dragon/.local/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 5448, in pack
    "Pack", values=values, axis=axis, name=name)
  File "/home/dragon/.local/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/home/dragon/.local/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/dragon/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
    op_def=op_def)
  File "/home/dragon/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1823, in __init__
    control_input_ops)
  File "/home/dragon/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1662, in _create_c_op
    raise ValueError(str(e))
ValueError: Dimension 0 in both shapes must be equal, but are 4608 and 18. Shapes are [4608] and [18].
From merging shape 12 with other shapes. for 'concat/concat_dim' (op: 'Pack') with input shapes: [4096], [16], [16], [16], [8192], [32], [32], [32], [991232], [
256], [256], [256], [4608], [18].

Most of these errors were related to es_distributed/tf_util.py file which was originated from the tf.concat function or method.

Below were some of the changes:

  1. def concatenate(arrs, axis=0) function at line 30 - 31

from:

def concatenate(arrs, axis=0):
    return tf.concat(axis, arrs)

to:

def concatenate(arrs, axis=0):
    return tf.concat(arrs, axis)
  1. def flatgrad(loss, var_list) function at line 219 - 222

from:

def flatgrad(loss, var_list):
    grads = tf.gradients(loss, var_list)
    return tf.concat(0, [tf.reshape(grad, [numel(v)])
        for (v, grad) in zip(var_list, grads)])

to:

def flatgrad(loss, var_list):
    grads = tf.gradients(loss, var_list)
    return tf.concat([tf.reshape(grad, [numel(v)], 0)
        for (v, grad) in zip(var_list, grads)])
  1. def __init__(self, var_list) function at line 243 -244

from:

def __init__(self, var_list):
       self.op = tf.concat(0, [tf.reshape(v, [numel(v)]) for v in var_list])

to:

def __init__(self, var_list):
       self.op = tf.concat([tf.reshape(v, [numel(v)]) for v in var_list], 0)

My environment information:
Ubuntu 18.04 x64
Python 3.6.8
tensorflow 1.13.1
Click 7.0
atari-py 0.1.15
numpy 1.16.3
gym 0.12.1
baselines 0.1.5

Thanks

@EmanueleLM
Copy link

This error depends on the version of TensorFlow you use. With the required one (i.e 0.12.1) it works without changing the codebase. I also confirm that with the latest versions TensorFlow your solution solves the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants