Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent image size between rgb images and fmasks. #17

Open
krahets opened this issue Sep 5, 2024 · 1 comment
Open

Inconsistent image size between rgb images and fmasks. #17

krahets opened this issue Sep 5, 2024 · 1 comment

Comments

@krahets
Copy link

krahets commented Sep 5, 2024

The size of images and foreground masks are inconsistent in some of the scenarios. For instance, in 100681/:

  • Size of images is (1024, 1398).
  • Size of fmasks is (1500, 2048)

Does the intrinsic from camera_intrinsics.json correspond to the size of fmasks?

I'm trying to convert the data to nerfstudio format as described in #14, but the results are unsatisfactory. Can you help check what I did wrong:

import json
import pickle
import numpy as np
from PIL import Image

scene_dir = "/mnt/data/Datasets/mvhumannet/data/100681"

# process the extrinsics
with open(f"{scene_dir}/camera_scale.pkl", "rb") as file:
    cam_scale = pickle.load(file)

exts = json.load(open(f"{scene_dir}/camera_extrinsics.json"))
exts = list(exts.items())
exts.sort(key=lambda x: x[0])  # sort by spatial label

frames = []
for spa_label, ext in exts:
    spa_label = spa_label[2:-4]

    t = np.array(ext["translation"]) * cam_scale
    r = np.array(ext["rotation"])
    extrinsic = np.identity(4)
    extrinsic[:3, :3] = r
    extrinsic[:3, 3] = t

    pose = np.linalg.inv(extrinsic)  # extrinsic to pose
    pose[0:3, 1:3] *= -1  # OPENCV to OPENGL

    frames.append(
        {
            "file_path": f"images_lr/{spa_label}/0005_img.jpg",
            # "mask_path": f"fmask_lr/{spa_label}/0005_img_fmask.png",
            "transform_matrix": pose.tolist(),
        }
    )

# process the intrinsics
cam_ints = json.load(open(f"{scene_dir}/camera_intrinsics.json"))
K = np.array(cam_ints["intrinsics"])
K *= 1398 / 2048  # TODO: scale the intrinsics to the image size due to inconsistency
K[2, 2] = 1.0
w, h = Image.open(f"{scene_dir}/{frames[0]['file_path']}").size

transforms = {
    "w": w,
    "h": h,
    "fl_x": K[0, 0],
    "fl_y": K[1, 1],
    "cx": K[0, 2],
    "cy": K[1, 2],
    "k1": 0.0,
    "k2": 0.0,
    "p1": 0.0,
    "p2": 0.0,
    "camera_model": "OPENCV",
    "frames": frames,
}

with open(f"{scene_dir}/transforms.json", "w") as f:
    json.dump(transforms, f, indent=4)
@antonzub99
Copy link

As the aspect ratio is the same, I would simply resize images to [1500, 2048]. Camera intrinsics appear to be for [1500, 2048].
Or you could just drop this character.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants