Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to resolve this issue #8

Open
vtmjapandev opened this issue Jun 30, 2022 · 1 comment
Open

how to resolve this issue #8

vtmjapandev opened this issue Jun 30, 2022 · 1 comment

Comments

@vtmjapandev
Copy link

Hi guys,


(strv-ml-mask2face) pc1@pc1:~/Documents/aaa/bbb/strv-ml-mask2face$ python train.py
Using TensorFlow backend.
2022-06-30 17:14:12.794632: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2022-06-30 17:14:12.823483: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-30 17:14:12.824170: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 23.67GiB deviceMemoryBandwidth: 871.81GiB/s
2022-06-30 17:14:12.824291: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2022-06-30 17:14:12.825107: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2022-06-30 17:14:12.825943: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2022-06-30 17:14:12.826089: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2022-06-30 17:14:12.826863: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2022-06-30 17:14:12.827299: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2022-06-30 17:14:12.828922: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2022-06-30 17:14:12.828982: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-30 17:14:12.829723: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-30 17:14:12.830395: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
Num GPUs Available:  1
Num CPUs Available:  1
Dataset already downloaded
2022-06-30 17:14:12.841843: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
2022-06-30 17:14:12.861820: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 3504000000 Hz
2022-06-30 17:14:12.862273: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x565245fd3d20 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2022-06-30 17:14:12.862290: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2022-06-30 17:14:12.914261: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-30 17:14:12.915012: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x565245fcae00 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2022-06-30 17:14:12.915027: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): NVIDIA GeForce RTX 3090, Compute Capability 8.6
2022-06-30 17:14:12.915166: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-30 17:14:12.915819: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 23.67GiB deviceMemoryBandwidth: 871.81GiB/s
2022-06-30 17:14:12.915861: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2022-06-30 17:14:12.915868: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2022-06-30 17:14:12.915875: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2022-06-30 17:14:12.915882: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2022-06-30 17:14:12.915889: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2022-06-30 17:14:12.915896: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2022-06-30 17:14:12.915902: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2022-06-30 17:14:12.915936: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-30 17:14:12.916580: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-30 17:14:12.917228: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2022-06-30 17:14:12.917264: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2022-06-30 17:14:12.918152: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-06-30 17:14:12.918163: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]      0 
2022-06-30 17:14:12.918169: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0:   N 
2022-06-30 17:14:12.918237: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-30 17:14:12.918934: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-30 17:14:12.919608: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 22177 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:01:00.0, compute capability: 8.6)
Testing and training data already generated
Here are few examples of generated data.
Using UNet Resnet model
Loading training data from data/train with limit of 10000 images
Loading testing data from data/test with limit of 1000 images
2022-06-30 17:17:52.185638: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2022-06-30 17:27:18.122355: W tensorflow/stream_executor/gpu/asm_compiler.cc:81] Running ptxas --version returned 256
2022-06-30 17:27:18.152390: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: ptxas exited with non-zero error code 256, output: 
Relying on driver to perform ptx compilation. 
Modify $PATH to customize ptxas location.
This message will be only logged once.
2022-06-30 17:27:18.647181: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
9/9 [==============================] - 18s 2s/step - loss: 0.6036 - acc: 0.5557 - recall: 0.7592 - precision: 0.9963
- TEST -> LOSS:     0.6036, ACC:     0.5557, RECALL:     0.7592, PRECISION:     0.9963
Epoch 1/20
2022-06-30 17:28:53.177571: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:184] Filling up shuffle buffer (this may take a while): 2582 of 5000
2022-06-30 17:29:00.955985: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:233] Shuffle buffer filled.
Traceback (most recent call last):
  File "train.py", line 71, in <module>
    model.train(epochs=training_epochs, batch_size=batch_size, loss_function='ssim_l1_loss')
  File "/home/pctest/Documents/aaa/bbb/strv-ml-mask2face/utils/model.py", line 132, in train
    history = self.model.fit(train_dataset, validation_data=valid_dataset, epochs=epochs, callbacks=callbacks)
  File "/home/pctest/anaconda3/envs/strv-ml-mask2face/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 66, in _method_wrapper
    return method(self, *args, **kwargs)
  File "/home/pctest/anaconda3/envs/strv-ml-mask2face/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 848, in fit
    tmp_logs = train_function(iterator)
  File "/home/pctest/anaconda3/envs/strv-ml-mask2face/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 580, in __call__
    result = self._call(*args, **kwds)
  File "/home/pctest/anaconda3/envs/strv-ml-mask2face/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 644, in _call
    return self._stateless_fn(*args, **kwds)
  File "/home/pctest/anaconda3/envs/strv-ml-mask2face/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 2420, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/home/pctest/anaconda3/envs/strv-ml-mask2face/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1665, in _filtered_call
    self.captured_inputs)
  File "/home/pctest/anaconda3/envs/strv-ml-mask2face/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1746, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/home/pctest/anaconda3/envs/strv-ml-mask2face/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 598, in call
    ctx=ctx)
  File "/home/pctest/anaconda3/envs/strv-ml-mask2face/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument:  assertion failed: [predictions must be >= 0] [Condition x >= y did not hold element-wise:] [x (model/conv2d_38/Sigmoid:0) = ] [[[[0.499759793 0.503403068 0.505646]]]...] [y (Cast_6/x:0) = ] [0]
         [[{{node assert_greater_equal/Assert/AssertGuard/else/_109/Assert}}]]
         [[assert_greater_equal_1/Assert/AssertGuard/pivot_f/_139/_87]]
  (1) Invalid argument:  assertion failed: [predictions must be >= 0] [Condition x >= y did not hold element-wise:] [x (model/conv2d_38/Sigmoid:0) = ] [[[[0.499759793 0.503403068 0.505646]]]...] [y (Cast_6/x:0) = ] [0]
         [[{{node assert_greater_equal/Assert/AssertGuard/else/_109/Assert}}]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_9982]

Function call stack:
train_function -> train_function

@Arunimchakraborty
Copy link

Did you get the solution to this problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants