Skip to content
This repository has been archived by the owner on Jun 13, 2024. It is now read-only.

Experimenting with a real scene #22

Closed
eyildiz-ugoe opened this issue Jul 17, 2018 · 13 comments
Closed

Experimenting with a real scene #22

eyildiz-ugoe opened this issue Jul 17, 2018 · 13 comments

Comments

@eyildiz-ugoe
Copy link

eyildiz-ugoe commented Jul 17, 2018

I would like to try a real scene in which I can detect and fit my holepuncher (which is exactly the same holepuncher in LINEMOD). So, since the network is already trained to work with LINEMOD, this should work out of the box.

1

Do I need to create the very same experimental environment, though? Like aruco markers, similar objects such as ape, cat, benchvise and others? Or can I just take a picture of the same holepuncher with my camera (calibrated) and test the image?

@btekin
Copy link
Collaborator

btekin commented Jul 18, 2018

Since we change the background of the objects in the LINEMOD dataset with random images from the PASCAL VOC dataset, the network should in principle generalize to images from unseen environments given that you have the same (or similar) object model.

@eyildiz-ugoe
Copy link
Author

Okay, I'll then try to take a picture with a calibrated camera and try it out.

@eyildiz-ugoe
Copy link
Author

So I experimented with a calibrated camera, feeding the image below and running the validation script:

sample

and I got the output of:

2018-08-27 10:02:21    Testing holepuncher...
2018-08-27 10:02:21    Number of test samples: 1
-----------------------------------
  tensor to cuda : 0.001424
         predict : 0.912986
get_region_boxes : 0.024027
            eval : 0.000041
           total : 0.938478
-----------------------------------
2018-08-27 10:02:22 Results of holepuncher
2018-08-27 10:02:22    Acc using 5 px 2D Projection = 0.00%
2018-08-27 10:02:22    Acc using 10% threshold - 0.0162 vx 3D Transformation = 0.00%
2018-08-27 10:02:22    Acc using 5 cm 5 degree metric = 0.00%
2018-08-27 10:02:22    Mean 2D pixel error is nan, Mean vertex error is nan, mean corner error is nan
Traceback (most recent call last):
  File "valid.py", line 293, in <module>
    valid(datacfg, cfgfile, weightfile, outfile)
  File "valid.py", line 280, in valid
    logging('   Translation error: %f m, angle error: %f degree, pixel error: % f pix' % (testing_error_trans/nts, testing_error_angle/nts, testing_error_pixel/nts) )
ZeroDivisionError: float division by zero

How should this output be interpreted? Since some values are found to be "nan", I guess it crashes due to float division by zero. Is there anything I am missing?

@btekin
Copy link
Collaborator

btekin commented Sep 2, 2018

While evaluating the accuracy in valid.py, we iterate through each ground-truth object in L149. If you don't have ground-truth annotations during validation (if num_gts is 0), you will not be able to process this part of the code and errs_2d, errs_3d and errs_corner2D will be empty arrays. When you divide by their length, you will divide by zero and get nans. Therefore, to run the validation code, you can either provide ground-truth annotations if you already have them, or modify the above mentioned parts of the code so that you can test without the ground-truth annotations.

@eyildiz-ugoe
Copy link
Author

eyildiz-ugoe commented Sep 3, 2018

Well, I added the ground-truth mask now (under masks/), however I am getting this error:

2018-09-03 14:44:15    Testing holepuncher...
2018-09-03 14:44:15    Number of test samples: 1
Traceback (most recent call last):
  File "valid.py", line 293, in <module>
    valid(datacfg, cfgfile, weightfile, outfile)
  File "valid.py", line 136, in valid
    output = model(data).data  
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/user/workspace/singleshotpose/darknet.py", line 91, in forward
    x = self.models[ind](x)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/container.py", line 67, in forward
    input = module(input)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/conv.py", line 277, in forward
    self.padding, self.dilation, self.groups)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/functional.py", line 89, in conv2d
    torch.backends.cudnn.deterministic, torch.backends.cudnn.enabled)
RuntimeError: argument 1 (padding) must be tuple of int but got tuple of (float, float)

The command I am running is:

python3 valid.py cfg/holepuncher.data cfg/yolo-pose.cfg backup/holepuncher/model_backup.weights output/holepuncher

P.S: I made a folder for output files, the last argument is for that one.

@eyildiz-ugoe
Copy link
Author

I am quite curious if someone there actually experimented with the same objects however in a different environment (no markers, just a plane desk surface like in the picture). I would really like to see if the results are actually reproducible.

@btekin
Copy link
Collaborator

btekin commented Sep 4, 2018

Hi @eyildiz-ugoe, as mentioned in readme, the code has been tested on python2.7. You seem to be testing it with python3 which seems to have some incompatibility as also was discussed previously in #30 . Also make sure that you have PyTorch 0.3 installed as discussed in earlier issues that you have opened before: #10, #11 and #33

Another thing with testing the approach on random unconstrained images is that ideally the images that you provide should not deviate so much from the training images. It seems that the object you provide covers almost the entire image, we do not have such cases in our training data. If you would like to test on images like this, you can train a model with more extensive data augmentation with random scaling.

We already demonstrate the "reproducibility" of the results of the paper on both LINEMOD and Occlusion datasets by providing our full evaluation results together with this code, you just need to run the validation codes to reproduce the results. Thanks for your interest in our code.

@eyildiz-ugoe
Copy link
Author

eyildiz-ugoe commented Sep 4, 2018

I did try with Python 2 as well, which threw another type of error.

2018-09-04 12:47:00    Testing holepuncher...
2018-09-04 12:47:00    Number of test samples: 1
-----------------------------------
  tensor to cuda : 0.001187
         predict : 0.923070
get_region_boxes : 0.022662
            eval : 0.000039
           total : 0.946958
-----------------------------------
2018-09-04 12:47:01 Results of holepuncher
2018-09-04 12:47:01    Acc using 5 px 2D Projection = 0.00%
2018-09-04 12:47:01    Acc using 10% threshold - 0.0162 vx 3D Transformation = 0.00%
2018-09-04 12:47:01    Acc using 5 cm 5 degree metric = 0.00%
2018-09-04 12:47:01    Mean 2D pixel error is nan, Mean vertex error is nan, mean corner error is nan
Traceback (most recent call last):
  File "valid.py", line 293, in <module>
    valid(datacfg, cfgfile, weightfile, outfile)
  File "valid.py", line 280, in valid
    logging('   Translation error: %f m, angle error: %f degree, pixel error: % f pix' % (testing_error_trans/nts, testing_error_angle/nts, testing_error_pixel/nts) )
ZeroDivisionError: float division by zero

PS: This time I do have a mask.

sample

What I meant by reproducability is to see if it actually works with objects (of Linemod, such as that holepuncher I am having) in a different environment. If this works, one can invest his time and effort in extending it to other objects, training the network and etc. However, one first likes to see whether or not the whole thing works in a real environment (same objects), hence the issue opened.

Now, if the only problem is the height of the image, that is easy to solve. I can increase the height and take another picture. I doubt it will solve the problem though.

@btekin
Copy link
Collaborator

btekin commented Sep 4, 2018

As also discussed in #22 (comment), this error comes from not having label files for the object pose (having ground-truth masks as mentioned in your previous comment would not be sufficient). If you don't have ground-truth annotations during validation (if num_gts is 0), you will not be able to process this part of the code and errs_2d, errs_3d and errs_corner2D will be empty arrays. When you divide by their length, you will divide by zero and get nans.

To solve this error, you can either annotate the image that you have and provide a label file including these annotations, or write a separate test script yourself that dispenses with the need to iterate through each ground-truth object (see this part). You can re-train a model with more aggressive data augmentation using a larger scaling factor with objects. But before that you have to sort out the problem mentioned in the first paragraph.

We provide code to demonstrate that this approach works with de-facto object pose estimation datasets with also "real" environments (we do not test our approach on synthetic datasets), this could serve as a way to help people to adapt the code to their own needs. You just need to modify the code to your own personal needs without directly running the script and expecting it to solve all the problems. I hope the suggestions in the second paragraph help.

@eyildiz-ugoe
Copy link
Author

I guess I am getting lost a bit. Why do I need masks, annotations, label files to run a test of "singleshotpose" which is supposedly work with only one single RGB image? I am not going to re-train anything, I simply want to see if it works out of the box with an RGB image, like it is written in the paper.

The label file which apparently I need to provide contains information which goes beyond "singleshot":

9 0.514013 0.508339 0.563897 0.596026 0.566503 0.541087 0.456816 0.587022 0.450449 0.531784 0.568257 0.493188 0.571067 0.433076 0.466555 0.485384 0.461305 0.425124 0.120617 0.170902

So let me get this straight:

In order to make this work (not to train, just to test), one needs to provide MORE than a single RGB image. One needs to provide its mask, and the so called label file which contains 21 values which have to be attained somehow.

@btekin
Copy link
Collaborator

btekin commented Sep 5, 2018

I think you confuse a bit the concept of "single shot", by single shot we mean a single stage network (e.g. not having a multi-stage pipeline consisting of 2D detection, pose estimation, etc.), not a single RGB image.

In our code, we provide a validation script, but not a test script for unconstrained images. For the validation code, you need the ground-truth labels to quantify the error. We provided a validation script so that people can reproduce the results of the paper on standard benchmarks.

You can also easily write a small test script to estimate the pose of objects on your own images. You can do some minor modifications on the already provided validation code for this purpose (by also taking into account the suggestions in my previous comment, of course, if you want to). I can try to write a test script that won't need the labels and outputs a 6D pose on some other images than the Linemod images if I get some time, but as I told, this is something optional and something that one can easily do herself/himself.

@btekin btekin closed this as completed Sep 5, 2018
@G-YY
Copy link

G-YY commented Jul 29, 2019

How did you get the mask?I project the 3D model with the GT-pose but the result is not correct? @eyildiz-ugoe

@ValiaVl
Copy link

ValiaVl commented Sep 14, 2022

I think you confuse a bit the concept of "single shot", by single shot we mean a single stage network (e.g. not having a multi-stage pipeline consisting of 2D detection, pose estimation, etc.), not a single RGB image.

In our code, we provide a validation script, but not a test script for unconstrained images. For the validation code, you need the ground-truth labels to quantify the error. We provided a validation script so that people can reproduce the results of the paper on standard benchmarks.

You can also easily write a small test script to estimate the pose of objects on your own images. You can do some minor modifications on the already provided validation code for this purpose (by also taking into account the suggestions in my previous comment, of course, if you want to). I can try to write a test script that won't need the labels and outputs a 6D pose on some other images than the Linemod images if I get some time, but as I told, this is something optional and something that one can easily do herself/himself.

I think you confuse a bit the concept of "single shot", by single shot we mean a single stage network (e.g. not having a multi-stage pipeline consisting of 2D detection, pose estimation, etc.), not a single RGB image.

In our code, we provide a validation script, but not a test script for unconstrained images. For the validation code, you need the ground-truth labels to quantify the error. We provided a validation script so that people can reproduce the results of the paper on standard benchmarks.

You can also easily write a small test script to estimate the pose of objects on your own images. You can do some minor modifications on the already provided validation code for this purpose (by also taking into account the suggestions in my previous comment, of course, if you want to). I can try to write a test script that won't need the labels and outputs a 6D pose on some other images than the Linemod images if I get some time, but as I told, this is something optional and something that one can easily do herself/himself.

Hi @btekin . Did you have time to modify the valid.py in order to run inference in RBG images without the gt labels? It would be really helpful to provide me with this.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants