Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

您好,我用自定义数据集训练出现了这个问题:ValueError: Empty training data.训练之前的步骤都可以,之前跑通了原始voc数据集的训练以及部署。自己做一个小项目分4类,数据集按照voc格式制作的,前边的步骤也都可以,一到训练这里就 #26

Open
MintonLee opened this issue Aug 13, 2020 · 5 comments

Comments

@MintonLee
Copy link

(tf115) F:\1_work\K210\k210-for-yolo\yolo-for-trash_detection-k210>make train MODEL=yolo_mobilev1 DEPTHMUL=0.75 MAXEP=10 ILR=0.001 DATASET=voc CLSNUM=4 IAA=False BATCH=8
python ./keras_train.py
--train_set voc
--class_num 4
--pre_ckpt ""
--model_def yolo_mobilev1
--depth_multiplier 0.75
--augmenter False
--image_size 224 320
--output_size 7 10 14 20
--batch_size 8
--rand_seed 3
--max_nrof_epochs 10
--init_learning_rate 0.001
--learning_rate_decay_factor 0
--obj_weight 1
--noobj_weight 1
--wh_weight 1
--obj_thresh 0.7
--iou_thresh 0.5
--vaildation_split 0.05
--log_dir log
--is_prune False
--prune_initial_sparsity 0.5
--prune_final_sparsity 0.9
--prune_end_epoch 5
--prune_frequency 100
2020-08-13 16:31:51.021329: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:

2020-08-13 16:31:54.680652: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2020-08-13 16:31:54.693250: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-08-13 16:31:54.730848: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce RTX 2060 major: 7 minor: 5 memoryClockRate(GHz): 1.2
pciBusID: 0000:01:00.0
2020-08-13 16:31:54.736948: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
2020-08-13 16:31:54.746169: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
2020-08-13 16:31:54.753784: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_100.dll
2020-08-13 16:31:54.760818: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_100.dll
2020-08-13 16:31:54.767979: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_100.dll
2020-08-13 16:31:54.774382: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_100.dll
2020-08-13 16:31:54.787883: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-08-13 16:31:54.792935: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-08-13 16:31:55.371745: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-08-13 16:31:55.376201: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2020-08-13 16:31:55.379023: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2020-08-13 16:31:55.382793: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4608 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5)
�[34m[ INFO ]�[0m data augment is False
WARNING:tensorflow:From F:\software\Anaconda3\envs\tf115\lib\site-packages\tensorflow_core\python\data\util\random_seed.py:58: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
2020-08-13 16:31:55.534760: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce RTX 2060 major: 7 minor: 5 memoryClockRate(GHz): 1.2
pciBusID: 0000:01:00.0
2020-08-13 16:31:55.541031: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
2020-08-13 16:31:55.544708: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
2020-08-13 16:31:55.547742: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_100.dll
2020-08-13 16:31:55.551895: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_100.dll
2020-08-13 16:31:55.555008: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_100.dll
2020-08-13 16:31:55.559177: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_100.dll
2020-08-13 16:31:55.562276: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-08-13 16:31:55.565958: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-08-13 16:31:55.569220: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce RTX 2060 major: 7 minor: 5 memoryClockRate(GHz): 1.2
pciBusID: 0000:01:00.0
2020-08-13 16:31:55.574059: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
2020-08-13 16:31:55.578232: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
2020-08-13 16:31:55.581370: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_100.dll
2020-08-13 16:31:55.585411: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_100.dll
2020-08-13 16:31:55.588457: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_100.dll
2020-08-13 16:31:55.592139: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_100.dll
2020-08-13 16:31:55.595837: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-08-13 16:31:55.599397: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-08-13 16:31:55.602502: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-08-13 16:31:55.606356: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2020-08-13 16:31:55.608364: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2020-08-13 16:31:55.610806: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4608 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5)
�[34m[ INFO ]�[0m data augment is False
WARNING:tensorflow:From F:\software\Anaconda3\envs\tf115\lib\site-packages\tensorflow_core\python\ops\resource_variable_ops.py:1630: calling BaseResourceVariable.init (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
Train on 21 steps
Epoch 1/10
2020-08-13 16:32:22.746351: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-08-13 16:32:24.054899: W tensorflow/stream_executor/cuda/redzone_allocator.cc:312] Internal: Invoking ptxas not supported on Windows
Relying on driver to perform ptx compilation. This message will be only logged once.
2020-08-13 16:32:24.128920: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
1/21 [>.............................] - ETA: 2:31 - loss: 792.4858 - l1_loss: 131.6574 - l2_loss: 660.4179 - l1_p: 0.0000e+00 - l1_r: 0.0000e+00 - l2_p: 5.8514e-04 - l2_r: 0.20002020-08-13 16:32:26.777512: I tensorflow/core/profiler/lib/profiler_session.cc:205] Profiler session started.
2020-08-13 16:32:26.787342: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cupti64_100.dll
WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.280863). Check your callbacks.
2/21 [=>............................] - ETA: 1:17 - loss: 817.4302 - l1_loss: 172.4517 - l2_loss: 644.5675 - l1_p: 0.0012 - l1_r: 0.2500 - l2_p: 0.0025 - l2_r: 0.5000 2020-08-13 16:32:27.322285: I tensorflow/core/platform/default/device_tracer.cc:588] Collecting 3504 kernel records, 316 memcpy records.
WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.444144). Check your callbacks.
3/21 [===>..........................] - ETA: 52s - loss: 745.5250 - l1_loss: 159.0245 - l2_loss: 586.0892 - l1_p: 8.5985e-04 - l1_r: 0.1429 - l2_p: 0.0023 - l2_r: 0.3636WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.280863). Check your callbacks.
4/21 [====>.........................] - ETA: 37s - loss: 695.2993 - l1_loss: 148.6290 - l2_loss: 546.2584 - l1_p: 0.0015 - l1_r: 0.1538 - l2_p: 0.0022 - l2_r: 0.3200 WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.117583). Check your callbacks.
20/21 [===========================>..] - ETA: 0s - loss: 364.4713 - l1_loss: 68.8265 - l2_loss: 295.2221 - l1_p: 0.0012 - l1_r: 0.0385 - l2_p: 0.0020 - l2_r: 0.0563Traceback (most recent call last):
File "./keras_train.py", line 155, in
args.prune_frequency)
File "./keras_train.py", line 99, in main
validation_data=h.test_dataset, validation_steps=int(h.test_epoch_step * h.validation_split))
File "F:\software\Anaconda3\envs\tf115\lib\site-packages\tensorflow_core\python\keras\engine\training.py", line 727, in fit
use_multiprocessing=use_multiprocessing)
File "F:\software\Anaconda3\envs\tf115\lib\site-packages\tensorflow_core\python\keras\engine\training_arrays.py", line 675, in fit
steps_name='steps_per_epoch')
File "F:\software\Anaconda3\envs\tf115\lib\site-packages\tensorflow_core\python\keras\engine\training_arrays.py", line 440, in model_iteration
steps_name='validation_steps')
File "F:\software\Anaconda3\envs\tf115\lib\site-packages\tensorflow_core\python\keras\engine\training_arrays.py", line 411, in model_iteration
aggregator.finalize()
File "F:\software\Anaconda3\envs\tf115\lib\site-packages\tensorflow_core\python\keras\engine\training_utils.py", line 138, in finalize
raise ValueError('Empty training data.')
ValueError: Empty training data.
Makefile:35: recipe for target 'train' failed
make: *** [train] Error 1

@MintonLee
Copy link
Author

这个是什么问题,是数据集大小需要修改吗?还是路径问题?还是batch问题,我已经修改了batch到1也是这个问题

@xji-apex
Copy link

同样的问题,有人可以指点一下嘛?

@zhen8838
Copy link
Owner

自定义数据集的时候很容易出现数据太少,并且我之前的代码是对于测试集是直接取训练集中的一部分,可能导致他无法满足tf.dataset里面的buffer出现这个问题,你可以在定义输入数据管道的地方将shufflemap等操作的buffer改小一些。

@uguisu
Copy link

uguisu commented Nov 30, 2020

我也遇到过同样问题。程序自动对数据集进行分割,分成训练集和测试集。这里的参数batch_sizevaildation_split直接影响训练集和测试集的大小。如果按照默认设置batch_size = 32, vaildation_split = 0.05则会出现测试集过小,训练中途失败的问题。

我是通过调整参数来解决问题的。例如将设置修改为batch_size = 16, vaildation_split = 0.1

@noobgrow
Copy link

noobgrow commented Apr 19, 2021

在tools/utils.py 文件中修改_create_dataset函数中的shuffle参数如下,并在train_fit函数中把validation_step 参数改为定值,亲测可以解决!!(把buffer大小改为数据集大小就可以了 !!)

def _create_dataset()

dataset = (tf.data.Dataset.from_generator(gen, (tf.framework_ops.dtypes.string, tf.float32), ([], [None, 5])).
shuffle(self.train_total_data if is_training == True else self.test_total_data, rand_seed).repeat().
map(_parser_wrapper, tf.data.experimental.AUTOTUNE).
batch(batch_size, True).prefetch(tf.data.experimental.AUTOTUNE))

    return dataset

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants