You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thanks very much for your work. I use docker to build an environment to learn your work. When I use FROM tensorflow/tensorflow:2.3.3-gpu-jupyter to create a container, and test the examples
all the tests passed. But when I use newer images, for instance, FROM tensorflow/tensorflow:2.4.2-gpu-jupyter, I got the ValueError: Data cardinality is ambiguous error as presented below.
$ python train.py InvertedPendulumBulletEnv-v0
['/home/wezardlza/workspace/trpo', '/usr/lib/python36.zip', '/usr/lib/python3.6', '/usr/lib/python3.6/lib-dynload', '/home/wezardlza/.local/lib/python3.6/site-packages', '/usr/local/lib/python3.6/dist-packages', '/usr/lib/python3/dist-packages', '/home/wezardlza/workspace']
pybullet build time: Jun 22 2021 23:31:53
2021-06-22 23:42:07.575098: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
/home/wezardlza/.local/lib/python3.6/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
Value Params -- h1: 60, h2: 17, h3: 5, lr: 0.00243
2021-06-22 23:42:08.572333: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-06-22 23:42:08.572860: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-06-22 23:42:08.603929: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-22 23:42:08.604218: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2070 SUPER computeCapability: 7.5
coreClock: 1.815GHz coreCount: 40 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 417.29GiB/s
2021-06-22 23:42:08.604237: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-06-22 23:42:08.605660: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-06-22 23:42:08.605710: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-06-22 23:42:08.606300: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-06-22 23:42:08.606448: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-06-22 23:42:08.608003: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-06-22 23:42:08.608401: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-06-22 23:42:08.608521: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-06-22 23:42:08.608599: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-22 23:42:08.608890: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-22 23:42:08.609111: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-06-22 23:42:08.609301: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-06-22 23:42:08.609478: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-06-22 23:42:08.609563: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-22 23:42:08.609800: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2070 SUPER computeCapability: 7.5
coreClock: 1.815GHz coreCount: 40 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 417.29GiB/s
2021-06-22 23:42:08.609823: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-06-22 23:42:08.609840: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-06-22 23:42:08.609850: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-06-22 23:42:08.609860: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-06-22 23:42:08.609870: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-06-22 23:42:08.609880: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-06-22 23:42:08.609891: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-06-22 23:42:08.609901: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-06-22 23:42:08.609946: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-22 23:42:08.610192: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-22 23:42:08.610404: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-06-22 23:42:08.610424: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-06-22 23:42:08.934643: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-06-22 23:42:08.934667: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0
2021-06-22 23:42:08.934672: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N
2021-06-22 23:42:08.934802: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-22 23:42:08.935063: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-22 23:42:08.935289: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-22 23:42:08.935494: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6638 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070 SUPER, pci bus id: 0000:01:00.0, compute capability: 7.5)
Policy Params -- h1: 60, h2: 24, h3: 10, lr: 0.000184, logvar_speed: 2
argv[0]=
argv[0]=
2021-06-22 23:42:09.103904: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-06-22 23:42:09.375022: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-06-22 23:42:10.302274: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-06-22 23:42:10.322790: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3600000000 Hz
Traceback (most recent call last):
File "train.py", line 351, in<module>
main(**vars(args))
File "train.py", line 317, in main
policy.update(observes, actions, advantages, logger) # update policy
File "/home/wezardlza/workspace/trpo/policy.py", line 61, in update
old_means, old_logvars, old_logp])
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py", line 1725, in train_on_batch
class_weight)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/data_adapter.py", line 1513, in single_batch_iterator
_check_data_cardinality(data)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/data_adapter.py", line 1529, in _check_data_cardinality
raise ValueError(msg)
ValueError: Data cardinality is ambiguous:
x sizes: 369, 369, 369, 369, 1, 369
Make sure all arrays contain the same number of samples.
After some checks, I found in file ./trpo/policy.py the below code caused the mismatched batch size
classPolicyNN(Layer):
""" Neural net for policy approximation function. Policy parameterized by Gaussian means and variances. NN outputs mean action based on observation. Trainable variables hold log-variances for each action dimension (i.e. variances not determined by NN). """defbuild(self, input_shape):
self.batch_sz=input_shape[0]
defcall(self, inputs, **kwargs):
y=self.dense1(inputs)
y=self.dense2(y)
y=self.dense3(y)
means=self.dense4(y)
logvars=K.sum(self.logvars, axis=0, keepdims=True) +self.init_logvarlogvars=K.tile(logvars, (self.batch_sz, 1))
return [means, logvars]
which set the first dimension of logvars to be one during runtime constantly while the first dimension of inputs seems varied. Thus, based on the code above, the first dimension of means is also different from logvars which causes the error
File "/home/wezardlza/workspace/trpo/policy.py", line 61, in update
old_means, old_logvars, old_logp])
Thus, I do the following things: In file ./trpo/policy.py, add
fromtensorflowimportshape
and change logvars = K.tile(logvars, (self.batch_sz, 1)) to logvars = K.tile(logvars, (shape(inputs)[0], 1)). These helped me to pass the exmple
python train.py InvertedPendulumBulletEnv-v0
but it seems self.batch_sz will not be used anymore. Perhaps we can just change logvars = K.tile(logvars, (self.batch_sz, 1)) to logvars = K.tile(logvars, (shape(inputs)[0], 1)) and remove the build() method above? I am new to TensorFlow and would like to know whether my changes will cause any problems or even errors for the TRPO results. Thanks for help!
The text was updated successfully, but these errors were encountered:
wezardlza
changed the title
Can the Data cardinality is ambiguous error in Tensorflow 2.4 or 2.5 be solve as follows?
Can the Data cardinality is ambiguous error in Tensorflow 2.4 or 2.5 be solved as follows?
Jun 23, 2021
Hi, thanks very much for your work. I use docker to build an environment to learn your work. When I use
FROM tensorflow/tensorflow:2.3.3-gpu-jupyter
to create a container, and test the examplesall the tests passed. But when I use newer images, for instance,
FROM tensorflow/tensorflow:2.4.2-gpu-jupyter
, I got theValueError: Data cardinality is ambiguous
error as presented below.After some checks, I found in file
./trpo/policy.py
the below code caused the mismatched batch sizewhich set the first dimension of
logvars
to be one during runtime constantly while the first dimension ofinputs
seems varied. Thus, based on the code above, the first dimension ofmeans
is also different fromlogvars
which causes the errorThus, I do the following things: In file
./trpo/policy.py
, addand change
logvars = K.tile(logvars, (self.batch_sz, 1))
tologvars = K.tile(logvars, (shape(inputs)[0], 1))
. These helped me to pass the exmplebut it seems
self.batch_sz
will not be used anymore. Perhaps we can just changelogvars = K.tile(logvars, (self.batch_sz, 1))
tologvars = K.tile(logvars, (shape(inputs)[0], 1))
and remove thebuild()
method above? I am new to TensorFlow and would like to know whether my changes will cause any problems or even errors for the TRPO results. Thanks for help!The text was updated successfully, but these errors were encountered: