Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data preprocessing issues #165

Open
kmatzen opened this issue Aug 19, 2024 · 1 comment
Open

Data preprocessing issues #165

kmatzen opened this issue Aug 19, 2024 · 1 comment

Comments

@kmatzen
Copy link

kmatzen commented Aug 19, 2024

I'm going to consolidate some data preprocessing problems here and then someone can break them out into separate issues as needed.

I'm attempting to go through the instructions as written and wanted to note some incompatibilities that have arisen.

  • scannet++ sfm reconstructions appear to lack some images that are in the selected images.
  • hm3d, gibson, and replica cad are missing some views
    • I'm extending the skip-and-retry logic to handle missing files:
              # load the view (and use the next one if this one's broken)
              for ii in range(view_index, view_index + 5):
                  try:
                      image, depthmap, intrinsics, camera_pose = self._load_one_view(data_path, key, ii % 5, resolution, rng)
                      if np.isfinite(camera_pose).all():
                          break
                  except Exception as exc:
                      print(exc)
                      pass
    
  • blendedmvs fails a check on the principal point
    • Error:
    [rank6]:     assert min_margin_x > W/5, f'Bad principal point in view={info}'
    [rank6]: AssertionError: Bad principal point in view=('data/blendedmvs_processed/000000000000000000000001', '00000036')
    
    • As a workaround I commented out both the horizontal and vertical principal point asserts. 1/5 the height and width seems like a heuristic?
  • Ran into this issue preprocess_co3d.py can't work #162
  • find_scenes.py doesn't seem to use the same validation set size as what is advertised in the top level readme.
--- a/datasets_preprocess/habitat/find_scenes.py
+++ b/datasets_preprocess/habitat/find_scenes.py
@@ -49,8 +49,8 @@ def find_all_scenes(habitat_root, n_scenes=[100000]):
     print(f'from {len(list_scenes)} scenes in total')
 
     np.random.shuffle(list_scenes)
-    train_scenes = list_scenes[len(list_scenes)//10:]
-    val_scenes = list_scenes[:len(list_scenes)//10]
+    train_scenes = list_scenes[len(list_scenes)//1000:]
+    val_scenes = list_scenes[:len(list_scenes)//1000]
 
     def write_scene_list(scenes, n, fpath):
         sub_scenes = [os.path.join(scene, id) for scene, ids in scenes for id in ids]
@@ -65,7 +65,7 @@ def find_all_scenes(habitat_root, n_scenes=[100000]):
 
     for n in n_scenes:
         write_scene_list(train_scenes, n, os.path.join(habitat_root, f'Habitat_{n}_scenes_tra
in.txt'))
-        write_scene_list(val_scenes, n//10, os.path.join(habitat_root, f'Habitat_{n//10}_scen
es_val.txt'))
+        write_scene_list(val_scenes, n//1000, os.path.join(habitat_root, f'Habitat_{n//1000}_
scenes_val.txt'))
@yocabon
Copy link
Contributor

yocabon commented Sep 20, 2024

Hi,
Thanks for the issue; I apologize for not looking into this earlier.

It's also related to #159

about scannet++
we were using one the early releases (not v1), and it seems that they updated the dataset since then (and are going to update it again ?).
We will eventually update the released pairs to include the new scenes.

blendedmvs/find_scenes/co3d error: should now be addressed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants