-
Notifications
You must be signed in to change notification settings - Fork 214
Experimenting with a real scene #22
Comments
Since we change the background of the objects in the LINEMOD dataset with random images from the PASCAL VOC dataset, the network should in principle generalize to images from unseen environments given that you have the same (or similar) object model. |
Okay, I'll then try to take a picture with a calibrated camera and try it out. |
So I experimented with a calibrated camera, feeding the image below and running the validation script: and I got the output of:
How should this output be interpreted? Since some values are found to be "nan", I guess it crashes due to float division by zero. Is there anything I am missing? |
While evaluating the accuracy in |
Well, I added the ground-truth mask now (under masks/), however I am getting this error:
The command I am running is:
P.S: I made a folder for output files, the last argument is for that one. |
I am quite curious if someone there actually experimented with the same objects however in a different environment (no markers, just a plane desk surface like in the picture). I would really like to see if the results are actually reproducible. |
Hi @eyildiz-ugoe, as mentioned in readme, the code has been tested on python2.7. You seem to be testing it with python3 which seems to have some incompatibility as also was discussed previously in #30 . Also make sure that you have PyTorch 0.3 installed as discussed in earlier issues that you have opened before: #10, #11 and #33 Another thing with testing the approach on random unconstrained images is that ideally the images that you provide should not deviate so much from the training images. It seems that the object you provide covers almost the entire image, we do not have such cases in our training data. If you would like to test on images like this, you can train a model with more extensive data augmentation with random scaling. We already demonstrate the "reproducibility" of the results of the paper on both LINEMOD and Occlusion datasets by providing our full evaluation results together with this code, you just need to run the validation codes to reproduce the results. Thanks for your interest in our code. |
I did try with Python 2 as well, which threw another type of error.
PS: This time I do have a mask. What I meant by reproducability is to see if it actually works with objects (of Linemod, such as that holepuncher I am having) in a different environment. If this works, one can invest his time and effort in extending it to other objects, training the network and etc. However, one first likes to see whether or not the whole thing works in a real environment (same objects), hence the issue opened. Now, if the only problem is the height of the image, that is easy to solve. I can increase the height and take another picture. I doubt it will solve the problem though. |
As also discussed in #22 (comment), this error comes from not having label files for the object pose (having ground-truth masks as mentioned in your previous comment would not be sufficient). If you don't have ground-truth annotations during validation (if num_gts is 0), you will not be able to process this part of the code and errs_2d, errs_3d and errs_corner2D will be empty arrays. When you divide by their length, you will divide by zero and get nans. To solve this error, you can either annotate the image that you have and provide a label file including these annotations, or write a separate test script yourself that dispenses with the need to iterate through each ground-truth object (see this part). You can re-train a model with more aggressive data augmentation using a larger scaling factor with objects. But before that you have to sort out the problem mentioned in the first paragraph. We provide code to demonstrate that this approach works with de-facto object pose estimation datasets with also "real" environments (we do not test our approach on synthetic datasets), this could serve as a way to help people to adapt the code to their own needs. You just need to modify the code to your own personal needs without directly running the script and expecting it to solve all the problems. I hope the suggestions in the second paragraph help. |
I guess I am getting lost a bit. Why do I need masks, annotations, label files to run a test of "singleshotpose" which is supposedly work with only one single RGB image? I am not going to re-train anything, I simply want to see if it works out of the box with an RGB image, like it is written in the paper. The label file which apparently I need to provide contains information which goes beyond "singleshot":
So let me get this straight: In order to make this work (not to train, just to test), one needs to provide MORE than a single RGB image. One needs to provide its mask, and the so called label file which contains 21 values which have to be attained somehow. |
I think you confuse a bit the concept of "single shot", by single shot we mean a single stage network (e.g. not having a multi-stage pipeline consisting of 2D detection, pose estimation, etc.), not a single RGB image. In our code, we provide a validation script, but not a test script for unconstrained images. For the validation code, you need the ground-truth labels to quantify the error. We provided a validation script so that people can reproduce the results of the paper on standard benchmarks. You can also easily write a small test script to estimate the pose of objects on your own images. You can do some minor modifications on the already provided validation code for this purpose (by also taking into account the suggestions in my previous comment, of course, if you want to). I can try to write a test script that won't need the labels and outputs a 6D pose on some other images than the Linemod images if I get some time, but as I told, this is something optional and something that one can easily do herself/himself. |
How did you get the mask?I project the 3D model with the GT-pose but the result is not correct? @eyildiz-ugoe |
Hi @btekin . Did you have time to modify the valid.py in order to run inference in RBG images without the gt labels? It would be really helpful to provide me with this. |
I would like to try a real scene in which I can detect and fit my holepuncher (which is exactly the same holepuncher in LINEMOD). So, since the network is already trained to work with LINEMOD, this should work out of the box.
Do I need to create the very same experimental environment, though? Like aruco markers, similar objects such as ape, cat, benchvise and others? Or can I just take a picture of the same holepuncher with my camera (calibrated) and test the image?
The text was updated successfully, but these errors were encountered: