Experimenting with a real scene #22

eyildiz-ugoe · 2018-07-17T14:05:34Z

I would like to try a real scene in which I can detect and fit my holepuncher (which is exactly the same holepuncher in LINEMOD). So, since the network is already trained to work with LINEMOD, this should work out of the box.

Do I need to create the very same experimental environment, though? Like aruco markers, similar objects such as ape, cat, benchvise and others? Or can I just take a picture of the same holepuncher with my camera (calibrated) and test the image?

btekin · 2018-07-18T11:07:37Z

Since we change the background of the objects in the LINEMOD dataset with random images from the PASCAL VOC dataset, the network should in principle generalize to images from unseen environments given that you have the same (or similar) object model.

eyildiz-ugoe · 2018-07-18T11:20:37Z

Okay, I'll then try to take a picture with a calibrated camera and try it out.

eyildiz-ugoe · 2018-08-27T08:07:48Z

So I experimented with a calibrated camera, feeding the image below and running the validation script:

and I got the output of:

2018-08-27 10:02:21    Testing holepuncher...
2018-08-27 10:02:21    Number of test samples: 1
-----------------------------------
  tensor to cuda : 0.001424
         predict : 0.912986
get_region_boxes : 0.024027
            eval : 0.000041
           total : 0.938478
-----------------------------------
2018-08-27 10:02:22 Results of holepuncher
2018-08-27 10:02:22    Acc using 5 px 2D Projection = 0.00%
2018-08-27 10:02:22    Acc using 10% threshold - 0.0162 vx 3D Transformation = 0.00%
2018-08-27 10:02:22    Acc using 5 cm 5 degree metric = 0.00%
2018-08-27 10:02:22    Mean 2D pixel error is nan, Mean vertex error is nan, mean corner error is nan
Traceback (most recent call last):
  File "valid.py", line 293, in <module>
    valid(datacfg, cfgfile, weightfile, outfile)
  File "valid.py", line 280, in valid
    logging('   Translation error: %f m, angle error: %f degree, pixel error: % f pix' % (testing_error_trans/nts, testing_error_angle/nts, testing_error_pixel/nts) )
ZeroDivisionError: float division by zero

How should this output be interpreted? Since some values are found to be "nan", I guess it crashes due to float division by zero. Is there anything I am missing?

btekin · 2018-09-02T08:39:40Z

While evaluating the accuracy in valid.py, we iterate through each ground-truth object in L149. If you don't have ground-truth annotations during validation (if num_gts is 0), you will not be able to process this part of the code and errs_2d, errs_3d and errs_corner2D will be empty arrays. When you divide by their length, you will divide by zero and get nans. Therefore, to run the validation code, you can either provide ground-truth annotations if you already have them, or modify the above mentioned parts of the code so that you can test without the ground-truth annotations.

eyildiz-ugoe · 2018-09-03T12:46:42Z

Well, I added the ground-truth mask now (under masks/), however I am getting this error:

2018-09-03 14:44:15    Testing holepuncher...
2018-09-03 14:44:15    Number of test samples: 1
Traceback (most recent call last):
  File "valid.py", line 293, in <module>
    valid(datacfg, cfgfile, weightfile, outfile)
  File "valid.py", line 136, in valid
    output = model(data).data  
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/user/workspace/singleshotpose/darknet.py", line 91, in forward
    x = self.models[ind](x)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/container.py", line 67, in forward
    input = module(input)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/conv.py", line 277, in forward
    self.padding, self.dilation, self.groups)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/functional.py", line 89, in conv2d
    torch.backends.cudnn.deterministic, torch.backends.cudnn.enabled)
RuntimeError: argument 1 (padding) must be tuple of int but got tuple of (float, float)

The command I am running is:

python3 valid.py cfg/holepuncher.data cfg/yolo-pose.cfg backup/holepuncher/model_backup.weights output/holepuncher

P.S: I made a folder for output files, the last argument is for that one.

eyildiz-ugoe · 2018-09-04T09:57:44Z

I am quite curious if someone there actually experimented with the same objects however in a different environment (no markers, just a plane desk surface like in the picture). I would really like to see if the results are actually reproducible.

btekin · 2018-09-04T10:32:20Z

Hi @eyildiz-ugoe, as mentioned in readme, the code has been tested on python2.7. You seem to be testing it with python3 which seems to have some incompatibility as also was discussed previously in #30 . Also make sure that you have PyTorch 0.3 installed as discussed in earlier issues that you have opened before: #10, #11 and #33

Another thing with testing the approach on random unconstrained images is that ideally the images that you provide should not deviate so much from the training images. It seems that the object you provide covers almost the entire image, we do not have such cases in our training data. If you would like to test on images like this, you can train a model with more extensive data augmentation with random scaling.

We already demonstrate the "reproducibility" of the results of the paper on both LINEMOD and Occlusion datasets by providing our full evaluation results together with this code, you just need to run the validation codes to reproduce the results. Thanks for your interest in our code.

eyildiz-ugoe · 2018-09-04T10:54:04Z

I did try with Python 2 as well, which threw another type of error.

2018-09-04 12:47:00    Testing holepuncher...
2018-09-04 12:47:00    Number of test samples: 1
-----------------------------------
  tensor to cuda : 0.001187
         predict : 0.923070
get_region_boxes : 0.022662
            eval : 0.000039
           total : 0.946958
-----------------------------------
2018-09-04 12:47:01 Results of holepuncher
2018-09-04 12:47:01    Acc using 5 px 2D Projection = 0.00%
2018-09-04 12:47:01    Acc using 10% threshold - 0.0162 vx 3D Transformation = 0.00%
2018-09-04 12:47:01    Acc using 5 cm 5 degree metric = 0.00%
2018-09-04 12:47:01    Mean 2D pixel error is nan, Mean vertex error is nan, mean corner error is nan
Traceback (most recent call last):
  File "valid.py", line 293, in <module>
    valid(datacfg, cfgfile, weightfile, outfile)
  File "valid.py", line 280, in valid
    logging('   Translation error: %f m, angle error: %f degree, pixel error: % f pix' % (testing_error_trans/nts, testing_error_angle/nts, testing_error_pixel/nts) )
ZeroDivisionError: float division by zero

PS: This time I do have a mask.

What I meant by reproducability is to see if it actually works with objects (of Linemod, such as that holepuncher I am having) in a different environment. If this works, one can invest his time and effort in extending it to other objects, training the network and etc. However, one first likes to see whether or not the whole thing works in a real environment (same objects), hence the issue opened.

Now, if the only problem is the height of the image, that is easy to solve. I can increase the height and take another picture. I doubt it will solve the problem though.

btekin · 2018-09-04T11:52:05Z

As also discussed in #22 (comment), this error comes from not having label files for the object pose (having ground-truth masks as mentioned in your previous comment would not be sufficient). If you don't have ground-truth annotations during validation (if num_gts is 0), you will not be able to process this part of the code and errs_2d, errs_3d and errs_corner2D will be empty arrays. When you divide by their length, you will divide by zero and get nans.

To solve this error, you can either annotate the image that you have and provide a label file including these annotations, or write a separate test script yourself that dispenses with the need to iterate through each ground-truth object (see this part). You can re-train a model with more aggressive data augmentation using a larger scaling factor with objects. But before that you have to sort out the problem mentioned in the first paragraph.

We provide code to demonstrate that this approach works with de-facto object pose estimation datasets with also "real" environments (we do not test our approach on synthetic datasets), this could serve as a way to help people to adapt the code to their own needs. You just need to modify the code to your own personal needs without directly running the script and expecting it to solve all the problems. I hope the suggestions in the second paragraph help.

eyildiz-ugoe · 2018-09-05T08:46:49Z

I guess I am getting lost a bit. Why do I need masks, annotations, label files to run a test of "singleshotpose" which is supposedly work with only one single RGB image? I am not going to re-train anything, I simply want to see if it works out of the box with an RGB image, like it is written in the paper.

The label file which apparently I need to provide contains information which goes beyond "singleshot":

9 0.514013 0.508339 0.563897 0.596026 0.566503 0.541087 0.456816 0.587022 0.450449 0.531784 0.568257 0.493188 0.571067 0.433076 0.466555 0.485384 0.461305 0.425124 0.120617 0.170902

So let me get this straight:

In order to make this work (not to train, just to test), one needs to provide MORE than a single RGB image. One needs to provide its mask, and the so called label file which contains 21 values which have to be attained somehow.

btekin · 2018-09-05T09:56:38Z

I think you confuse a bit the concept of "single shot", by single shot we mean a single stage network (e.g. not having a multi-stage pipeline consisting of 2D detection, pose estimation, etc.), not a single RGB image.

In our code, we provide a validation script, but not a test script for unconstrained images. For the validation code, you need the ground-truth labels to quantify the error. We provided a validation script so that people can reproduce the results of the paper on standard benchmarks.

You can also easily write a small test script to estimate the pose of objects on your own images. You can do some minor modifications on the already provided validation code for this purpose (by also taking into account the suggestions in my previous comment, of course, if you want to). I can try to write a test script that won't need the labels and outputs a 6D pose on some other images than the Linemod images if I get some time, but as I told, this is something optional and something that one can easily do herself/himself.

G-YY · 2019-07-29T12:28:47Z

How did you get the mask?I project the 3D model with the GT-pose but the result is not correct? @eyildiz-ugoe

ValiaVl · 2022-09-14T09:51:01Z

I think you confuse a bit the concept of "single shot", by single shot we mean a single stage network (e.g. not having a multi-stage pipeline consisting of 2D detection, pose estimation, etc.), not a single RGB image.

In our code, we provide a validation script, but not a test script for unconstrained images. For the validation code, you need the ground-truth labels to quantify the error. We provided a validation script so that people can reproduce the results of the paper on standard benchmarks.

You can also easily write a small test script to estimate the pose of objects on your own images. You can do some minor modifications on the already provided validation code for this purpose (by also taking into account the suggestions in my previous comment, of course, if you want to). I can try to write a test script that won't need the labels and outputs a 6D pose on some other images than the Linemod images if I get some time, but as I told, this is something optional and something that one can easily do herself/himself.

Hi @btekin . Did you have time to modify the valid.py in order to run inference in RBG images without the gt labels? It would be really helpful to provide me with this.

eyildiz-ugoe closed this as completed Jul 18, 2018

eyildiz-ugoe reopened this Aug 27, 2018

btekin closed this as completed Sep 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experimenting with a real scene #22

Experimenting with a real scene #22

eyildiz-ugoe commented Jul 17, 2018 •

edited

Loading

btekin commented Jul 18, 2018

eyildiz-ugoe commented Jul 18, 2018

eyildiz-ugoe commented Aug 27, 2018

btekin commented Sep 2, 2018

eyildiz-ugoe commented Sep 3, 2018 •

edited

Loading

eyildiz-ugoe commented Sep 4, 2018

btekin commented Sep 4, 2018 •

edited

Loading

eyildiz-ugoe commented Sep 4, 2018 •

edited

Loading

btekin commented Sep 4, 2018

eyildiz-ugoe commented Sep 5, 2018

btekin commented Sep 5, 2018

G-YY commented Jul 29, 2019

ValiaVl commented Sep 14, 2022

Experimenting with a real scene #22

Experimenting with a real scene #22

Comments

eyildiz-ugoe commented Jul 17, 2018 • edited Loading

btekin commented Jul 18, 2018

eyildiz-ugoe commented Jul 18, 2018

eyildiz-ugoe commented Aug 27, 2018

btekin commented Sep 2, 2018

eyildiz-ugoe commented Sep 3, 2018 • edited Loading

eyildiz-ugoe commented Sep 4, 2018

btekin commented Sep 4, 2018 • edited Loading

eyildiz-ugoe commented Sep 4, 2018 • edited Loading

btekin commented Sep 4, 2018

eyildiz-ugoe commented Sep 5, 2018

btekin commented Sep 5, 2018

G-YY commented Jul 29, 2019

ValiaVl commented Sep 14, 2022

eyildiz-ugoe commented Jul 17, 2018 •

edited

Loading

eyildiz-ugoe commented Sep 3, 2018 •

edited

Loading

btekin commented Sep 4, 2018 •

edited

Loading

eyildiz-ugoe commented Sep 4, 2018 •

edited

Loading