DROID Dataset Status

Downloading the full DROID dataset (95,600 Franka robot episodes, 27.6M frames) for large-scale PARA pretraining. Currently at 151GB / ~321GB (55% of data + ext1/ext2 videos). 56,461 parquets and ~98K videos downloaded. 13 of 96 chunks fully complete, remaining chunks partially downloaded. Training pipeline verified end-to-end on a 2-episode test run with correct pixel projection and loss convergence.

Dataset Overview

DROID dataset properties
Property	Value
Source	cadene/droid_1.0.1 (HuggingFace LeRobot format)
Robot	Franka Emika Panda (7-DOF)
Total episodes	95,600
Total frames	27.6M (15 fps)
Cameras	3 per episode: ext1, ext2, wrist (downloading ext1+ext2 only)
Image resolution	320 x 180 (resized to 448x448 for training)
Video codec	AV1 (re-encoded to H.264 for viewing)
State data	Joint positions (7), cartesian position (6), gripper (1), camera extrinsics (6)
Camera extrinsics	[x,y,z,rx,ry,rz] per episode — camera pose in robot base frame
Camera intrinsics	Not included — estimated fy=130 (ZED 2 wide mode at 320x180)
Tasks	49,607 unique task descriptions (many episodes unlabeled)
Labs	13 research labs worldwide (diverse setups, lighting, backgrounds)

Dataset Statistics (sampled from 200 episodes)

Statistics from random sample of 200 downloaded episodes
Metric	Mean	Min	Max	Median
Frames per episode	294	21	1,352	212
EEF height Z (m)	0.317	-0.147	0.982	—
Gripper position	0.386	0.000	1.000	—

Sample task descriptions from the dataset:

- Put the purple plush toy in the white bowl
- Move the black tape to the left of the measuring tape
- Push down the tap faucet
- Pick the blue shirt on the sofa and put it on the black chair
- Close the blinds of the window
- Fold the towel
- Empty the cup into the bowl
- Push the oven wire back into the oven

Sample Demonstrations

Four diverse episodes from different DROID labs/chunks, showing the range of tasks and environments. All videos are from the exterior camera 2 (third-person view).

Episode 10 — "Put the purple plush toy in the white bowl" (149 frames, lab table)

Episode 5000 — "Move the black tape to the left of the measuring tape" (152 frames, workbench)

Episode 49000 — "Push down the tap faucet" (209 frames, kitchen)

Episode 91000 — "Pick the blue shirt and put it on the black chair" (264 frames, bedroom)

Camera Projection Debug Visualization

Verified camera extrinsics and PARA-compatible projection using debug overlay. Each frame shows: green dot = EEF position projected to pixel, yellow line → cyan ring = height drop from EEF to base plane (z=0), RGB axis lines = EEF rotation frame (x=red, y=green, z=blue). Camera intrinsics estimated at fy=130 (ZED 2 wide mode). Projection roundtrip verified <1px error vs robosuite convention.

Debug Projection — Episode 10, ext2 (first / middle / last frame)

Episode 10, ext2 camera — EEF keypoint + height line + rotation axes. Green dot lands on actual gripper, confirming extrinsics and intrinsics are correct.

Full Episode Projection Video

Episode 10, ext2 — debug projection overlay tracking EEF throughout the full episode

Episode 10, ext1 — same episode from the other exterior camera

Dual Camera Views

Each episode has two exterior cameras. Here is episode 10 raw footage from ext1:

Episode 10 — Exterior camera 1 (ext1) raw footage

Data Structure

/data/cameron/droid/
├── meta/
│   ├── info.json          # Dataset metadata (95,600 eps, 15fps, Franka)
│   └── tasks.jsonl        # 49,607 task descriptions
├── data/
│   ├── chunk-000/         # Episodes 0-999
│   │   ├── episode_000000.parquet   # State + actions per frame
│   │   └── ...
│   └── chunk-095/         # Episodes 95000-95599
└── videos/
    ├── chunk-000/
    │   ├── observation.images.exterior_1_left/  # Ext cam 1 videos
    │   │   ├── episode_000000.mp4
    │   │   └── ...
    │   └── observation.images.exterior_2_left/  # Ext cam 2 videos
    └── chunk-095/

Parquet columns per episode: joint_position (7), cartesian_position (6: xyz + euler), gripper_position (1), camera_extrinsics for each camera (6: xyz + euler), language_instruction, timestamp, frame_index, episode_index

Download Status

Download progress as of 2026-04-02
Component	Downloaded	Total	Progress
Parquet files	56,461	95,600	59%
Ext1 videos	49,049	95,600	51%
Ext2 videos	49,202	95,600	51%
Disk usage	151 GB	~321 GB	47%
Complete chunks	13	96	14%

Download method: aria2c with direct HuggingFace URLs (16 concurrent downloads). Initial attempts used huggingface-cli but hit API rate limits (1000 req/5min). Switched to constructing URLs directly — no API calls needed. Download is resumable and running in tmux session droid_download.

Next Steps

Immediate:
- Complete download (~3-6h remaining at current rate)
- Verify data integrity: spot-check parquet/video pairs across all 96 chunks
- Compute full dataset statistics (height range, gripper distribution, frame counts)

Preprocessing for PARA training:
- DroidLocalDataset class ready at /data/cameron/para_droid_pretrain/libero/data_droid.py
- Reads parquet metadata into RAM, decodes video frames lazily via PyAV
- Projects EEF positions to pixels using estimated camera intrinsics (fy=130)
- Non-uniform resize 320x180 → 448x448 with properly scaled intrinsics

Pretraining plan:
- Train PARA on full 95K episodes, ext2 camera, skip rotation
- bs=48 on RTX 6000 Ada (23GB VRAM, ~1.6 it/s)
- 20 epochs, lr=1e-4, wandb logging
- Evaluate: does DROID-pretrained PARA transfer better to LIBERO than training from scratch?

Reproducibility

# Download dataset
python /data/cameron/droid/download_aria2.py

# Generate debug projection visualization
MUJOCO_GL=egl python droid_testing/debug_droid_projection.py \
  --episode 10 --camera ext2 --out-video output.mp4

# Train on full DROID
cd /data/cameron/para_droid_pretrain/libero
MUJOCO_GL=egl CUDA_VISIBLE_DEVICES=5 \
  DINO_REPO_DIR=/data/cameron/keygrip/volume_dino_tracks \
  DINO_WEIGHTS_PATH=/data/cameron/keygrip/dinov3/weights/dinov3_vits16plus_pretrain_lvd1689m-4057cbaa.pth \
  python train.py \
    --droid --droid_data_root /data/cameron/droid --droid_camera ext2 \
    --batch_size 48 --epochs 20 --lr 1e-4 \
    --run_name droid_pretrain_full_ext2 \
    --wandb_project para_droid --skip_rotation