Skip to content

Implement load_and_transform_depth_data #134

@OlafBraakman

Description

@OlafBraakman

Issues: #122 #14 #69 #121 report the fact that the load_and_transform_depth function is not implemented

I am raising this issue to implement the following data preprocessing steps in a PR, as it yield the reported 35% zero-shot classification for SUN-RGBD depth-only.

Important details for the scene classification task for SUNRGBD:

Scene subset:
The classification task only considers the following classes:
SCENES = ['bathroom', 'bedroom', 'classroom', 'computer_room', 'conference_room', 'corridor', 'dining_area', 'dining_room', 'discussion_area', 'furniture_store', 'home_office', 'kitchen', 'lab', 'lecture_theatre', 'library', 'living_room', 'office', 'rest_space', 'study_space' ]

To reproduce the SUNRGBD results one has to convert the raw depth data to standardized disparity in the following steps:

  1. Convert raw depth (uint16) to meters following the official SUN RGBD toolbox read3dPoints.m Toolbox
depth = cv2.imread(depth_file, cv2.IMREAD_UNCHANGED)
depth = ((depth >> 3) | (depth << 13)).astype(np.float32) / 1000.0
depth[depth > 8] = 8
  1. Convert depth to disparity using correct camera intrinsics. Following the response of @imisra with different baselines for each camera. Focal length for each sample can be obtained from the intrinsics.txt file.
from pathlib import Path # Optional I just used pathlib

focal_path = Path(depth_file).parents[1] / "intrinsics.txt"
focal_length = float(focal_path.read_text().strip().split()[0])
baseline = get_baseline(depth_file)
disparity = baseline * focal_length / depth

def get_baseline(path: str) -> float:
    if "kv1" in path:
        return 0.075
    elif "kv2" in path:
        return 0.075
    elif "realsense" in path:
        return 0.095
    elif "xtion" in path:
        return 0.095 # guessed based on length of 18cm for ASUS xtion v1
    else:
        raise Exception(f"No baseline found for path: {path}")
  1. Depth standardization by finding the mean and std of the disparity values across the training split. I find these values with the compute_depth_mean_std implementation from RGBD-Seg dataset_base.py.

This yields me the following mean and std values:
mean: 24.82968
std: 14.40078

Which can be used to normalize (depending on the raw of refined mode) as follows (based on preprocessing.py Normalize):

if self._depth_mode == 'raw':
    depth_0 = depth == 0
    depth = torchvision.transforms.Normalize(
        mean=24.82968, std=14.40078)(depth)
    # set invalid values back to zero again
    depth[depth_0] = 0
else:
    depth = torchvision.transforms.Normalize(
        mean=self.24.82968, std=14.40078)(depth)

Evaluated over the test split using above approach yield 35.2% depth accuracy

TODO: Create a PR

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions