-
Notifications
You must be signed in to change notification settings - Fork 214
What is wrong with my model? + summary & solutions to F.A.Q #75
Comments
I managed to get my "Acc using x vx 3D transformation up to 20 - 30%" using a more appropriate diameter in the .data file. I calculated this diameter using a piece of code that pair-wise compares all vertices and takes the pair with the greatest distance. Unfortunately I am not able to get any accuracy at the other 2 metrics (Acc using 5px 2D projection and Acc using 5 cm 5 degree metric). It seems that the model is unable to learn from my data? Might the data be too monotonous? If so I would expect to at least see some accuracy metrics go up (because the model would just over-fit). |
Too bad that nobody seems to be able (or willing) to help. I have a little update:
My accuracy after 1000 epochs is: What I find weird is that LINEMOD has only ±200 images per class for training (and ±1000 for testing). Hence, their training-set is very small, yet @btekin was able to reach high accuracy >90%. @btekin, I would love to hear your input about this. Is this due to pre-training? Can you tell us a bit more about how long your pretraining took and what parameters you used in your training/test set? Sidenote: Any thoughts? |
Hello, thanks for your interest in our code and sorry for the late reply, I didn't have the time to reply as I had to deal with my other work related projects. We follow the same training/test splits with earlier work, e.g. that of the BB8 paper by Rad & Lepetit, ICCV'17. Indeed, the training examples are sampled such that they cover a wide variety of viewpoints around the object. Having a more representative training set should, in principle, increase the accuracy on test examples. Instead of using initialization weights for another object, you could also pretrain the network on the same object by setting the regularization parameter for the confidence loss to zero, as explained in the readme file. See also the discussion in the paper and in #79 why such pretraining could be useful. About your findings: We already mention what order should be used for the keypoints in this link along with a step-by-step guide and this was also discussed in the duplicate issue #68. Custom datasets have different camera intrinsics matrix and might have different object models/scales. The scale of the object model should certainly be consistent and be set appropriately for a new dataset. |
Hello @btekin. Thank you for your answer, yet I would like to ask you to be more specific. The LINEMOD dataset uses ±200 images for training (while having 1200 images per class: 1000 remain for testing). How can ANY model learn from only 200 images? Aren't NNs like YOLO supposed to have thousands of images per class?
_Sidenotes:
You speak about pre-training being necessary. Can you be more elaborate about this? how long did you train for? How many images per class? 200 again? How many epochs. At this point I am distraught and about to give up... |
Same problem here. I raised an issue #85 mentioning that my model also is not learning. I used a very good source for generating my synthetic data, I have masked images too that are corresponding to my RGB images, correct intrinsics, precise labelling files, exact diam value and rightly scaled .ply file. I am using 1170 images with 65 different orientation of the object and different background and just using one class (object). I also previously tried the 15% training/85% testing ratio split but also didn't learn. Note: The maximum number of epochs I have reached was 177, but non of those weights was saved and the one at epoch 11 was only saved and never updated. After comparing my inputs with that of ape.data and finding everything is matching yet not learning for custom data, I am also about to give up... |
Your green ground truth box does seem to be an incorrect bounding-box though. Maybe there still is something wrong with the labeling for your case? If you see my green ground-truth bounding boxes, they are exactly matching the object. It is indeed the case that the code is written in such a way that it will not save weights when there is no increase in accuracy. You can add a line of code to make it save weights after every 10 epochs or so. I tried this as well, but it is not helpful; if the accuracy does not increase, you can save weights all you want, but they are useless. I have just now tried to train a model on 16.000+ images. I stopped the training at around 310 epochs, because I don't have the time to continue training this model, since I am only using a Nvidia 1080ti. Interesting enough, the loss function is pretty low, yet accuracy does not rise. At this point I think that in general, 1000 images for training should do the trick. Because I can only use a batch-size of max 8, I think that I have to train for far more epochs than proposed in the code (700 epochs). It would be nice to hear @btekin input 😄 |
Hi @jgcbrouns, good to hear from you and hope to hear also from @btekin soon 😄 Thats true, even my ground truth labels aren't matching which is quite strange. I forked @juanmed singleshotpose as he used the same source for generating the data I used and he also created a script called ndds_dataset_creator.py that inputs your 3D bounding box configurations and output a label text file that is compatible with singleshotpose. You can also visualize the points (labels) at each of your images. If you can write the way you created your label files. For saving the model, I know I can modify the script by saving whatever but as you said it is useless if the model isn't getting any better. But my question was, why from the first place my model isn't getting any better? Why accuracy is 0? By CUDA out of memory, I meant I reduced the batch size and subdivision to 4 as the yolo-pose-pre.cfg was having a batch size of 32. I think that the original singleshotpose using LINEMOD dataset was trained over than 700 epochs in way that they kept on updating the initialization weights with the trained model and trained allover again to improve accuracy or changed the 700 epochs to any other value in thousands. |
I also think that # epochs was way in the thousands.
For your case it is pretty straightforward I think: fix the bounding box corners (check for ground truth box correctness) and your model will learn at least something (like mine). Your other settings seem to be correct: .ply file, diameter of ply vertices, camera intrinsics. Another tip: check if your .ply file has multiple vertex points. I see that your object is a lego-block. A cube in general can be modeled as a parametric 3d model. The linemod objects all have many vertices and edges in their model. @btekin uses the individual vertices to calculate accuracies against. In my hypothesis, more vertices adds more change for higher accuracy. What you could try is add more vertices in your model via blender:
But again, before that, fix the ground truth boundingbox label coordinates :P I looked at the Nvidia data generator, but decided to create my own tool to label data in a Unity environment. Its more straightforward than the Nvidia tool. I rule out any mistakes in the way I generate the data and the labels since the ground-truths are correct. If you want, you can try my tool as well. |
@jgcbrouns @MohamadJabar1 Thank you for your kind feedbacks 😄
|
hi @btekin . Thanks for your respons! I think that the images that I posted above visualize the individual corners before PnP, straight from the predictions (red) and ground truths (green): It would be AWESOME if you could take a look at my dataset!!
[edit] |
Thank you both for your replies @jgcbrouns @btekin for your reply. I will follow your comments and try them on Tuesday - Wish you both Happy Easter 😄) @jgcbrouns Reason why I am using NDDS is that I want later to validate my model with an image taken from a robot software environment and NDDS provided me with all the necessities. @btekin It would be really great if you can take a look at our dataset (I will provide a sample) |
@jgcbrouns Thank you for providing your dataset. After inspecting examples from your dataset, I would suggest you to do the two following things and see if they help:
Hope these pointers might help your problem. Please let me know how it goes. @MohamadJaber1 As @jgcbrouns pointed out, I think you would need to fix the bounding box label coordinates in order for the network to start learning. If you provide a sample, I could also take a look at your data. |
Hi @btekin and @MohamadJaber1 Thank you @btekin for looking at my dataset. |
@jgcbrouns good to hear that, I will also convert the types of my images to jpg later But the problem is that after epoch 11, the model.weights is saved once and never updated. I trained the model for over than 140 epochs but even though never updated. @btekin Thank you so much for offering this help.
Please let me know what can you observe and how can I solve it |
0% acuracies after 100 epochs implies that there is something wrong for sure. The model is supposed to converge rather quickly (on average at epoch #30, the model starts showing accuracy increases. Before that, its stays at 0%). Here is my final dataset including masks and .jpg files (converted from .png): I'll take a look at your dataset now. |
About your quick update: You can validate if your order is correct:
Could you upload your dataset as 1 zip file that includes everything? .PLY model, generated (normalized) labels, images, masks etc. That makes it easier for me to test it. |
Hi @jgcbrouns Yea sure, it would be nice to contact you. Expect a mail soon 😎 I am currently training on your dataset to see if it will converge with me. Also, I will go few steps backwards to check all my own custom data (mostly the labels). For the zipped version of my data, please find it here (My .PLY apparently needed to be scaled) |
Hi @jgcbrouns |
@jgcbrouns thank you for all the info you've shared in this thread. I am currently trying to figure out what's wrong with my dataset (or dataset generating tool). I am trying to learn on the dataset you've provided in this thread. After that I would try to generate the same dataset with the tool I am using to see if it's working as expected. Could you please provide the texture that you've used for your object and also if possible your data generating tool? |
Hi everyone!
I was already having a discussions about my issues in issue-68, but decided to open a separate ticket anyway for completeness towards other people. As of now. I am clueless what is wrong with my model. My workflow and solved issues are as follows:
I came across multiple issues regarding the following:
Annotated labels are created automatically in Unity3D of which I expect there to be no camera distortion (for the intrinsic camera calibration). The original author's of the LINEMOD dataset use a Kinect camera that does have such a camera distortion. This internal camera calibration is necessary for among others the PnP-algorithm.
Find here, an example of an image and a label file from my trainings-set.
HOWEVER.
I am still not obtaining correct results and I am unsure about how long to train my models for. In the implementation 700epochs are stated, yet if I train for 4000 epochs, my results are still not good:
How many epochs should a new object be trained for?
NOTE: I am using benchvise/init.weights as my initial weights for my new model of the custom data-set.
This while my loss function goes down properly, but my accuracy measurements stay at 0%:
Could there still be a problem with how I created the annotation files, camera intrinsic parameters or .PLY model. Or could there be another problem that I am not considering?
@btekin Would it be an idea to add a F.A.Q section to the README, using my findings? I think the section about training on a custom data-set could use a lot more elaboration.
Moreover, I am curious as to what people are doing with singleshotpose. Anyone experimenting with some interesting use-cases?
Many thanks for anyone that can help!
The text was updated successfully, but these errors were encountered: