Downloading the full DROID dataset (95,600 Franka robot episodes, 27.6M frames) for large-scale PARA pretraining. Currently at 151GB / ~321GB (55% of data + ext1/ext2 videos). 56,461 parquets and ~98K videos downloaded. 13 of 96 chunks fully complete, remaining chunks partially downloaded. Training pipeline verified end-to-end on a 2-episode test run with correct pixel projection and loss convergence.
Dataset Overview
| Property | Value |
|---|---|
| Source | cadene/droid_1.0.1 (HuggingFace LeRobot format) |
| Robot | Franka Emika Panda (7-DOF) |
| Total episodes | 95,600 |
| Total frames | 27.6M (15 fps) |
| Cameras | 3 per episode: ext1, ext2, wrist (downloading ext1+ext2 only) |
| Image resolution | 320 x 180 (resized to 448x448 for training) |
| Video codec | AV1 (re-encoded to H.264 for viewing) |
| State data | Joint positions (7), cartesian position (6), gripper (1), camera extrinsics (6) |
| Camera extrinsics | [x,y,z,rx,ry,rz] per episode — camera pose in robot base frame |
| Camera intrinsics | Not included — estimated fy=130 (ZED 2 wide mode at 320x180) |
| Tasks | 49,607 unique task descriptions (many episodes unlabeled) |
| Labs | 13 research labs worldwide (diverse setups, lighting, backgrounds) |
Dataset Statistics (sampled from 200 episodes)
| Metric | Mean | Min | Max | Median |
|---|---|---|---|---|
| Frames per episode | 294 | 21 | 1,352 | 212 |
| EEF height Z (m) | 0.317 | -0.147 | 0.982 | — |
| Gripper position | 0.386 | 0.000 | 1.000 | — |
Sample task descriptions from the dataset:
- Put the purple plush toy in the white bowl
- Move the black tape to the left of the measuring tape
- Push down the tap faucet
- Pick the blue shirt on the sofa and put it on the black chair
- Close the blinds of the window
- Fold the towel
- Empty the cup into the bowl
- Push the oven wire back into the ovenSample Demonstrations
Four diverse episodes from different DROID labs/chunks, showing the range of tasks and environments. All videos are from the exterior camera 2 (third-person view).
Episode 10 — "Put the purple plush toy in the white bowl" (149 frames, lab table)
Episode 5000 — "Move the black tape to the left of the measuring tape" (152 frames, workbench)
Episode 49000 — "Push down the tap faucet" (209 frames, kitchen)
Episode 91000 — "Pick the blue shirt and put it on the black chair" (264 frames, bedroom)
Camera Projection Debug Visualization
Verified camera extrinsics and PARA-compatible projection using debug overlay. Each frame shows: green dot = EEF position projected to pixel, yellow line → cyan ring = height drop from EEF to base plane (z=0), RGB axis lines = EEF rotation frame (x=red, y=green, z=blue). Camera intrinsics estimated at fy=130 (ZED 2 wide mode). Projection roundtrip verified <1px error vs robosuite convention.
Debug Projection — Episode 10, ext2 (first / middle / last frame)

Episode 10, ext2 camera — EEF keypoint + height line + rotation axes. Green dot lands on actual gripper, confirming extrinsics and intrinsics are correct.
Full Episode Projection Video
Episode 10, ext2 — debug projection overlay tracking EEF throughout the full episode
Episode 10, ext1 — same episode from the other exterior camera
Dual Camera Views
Each episode has two exterior cameras. Here is episode 10 raw footage from ext1:
Episode 10 — Exterior camera 1 (ext1) raw footage
Data Structure
/data/cameron/droid/
├── meta/
│ ├── info.json # Dataset metadata (95,600 eps, 15fps, Franka)
│ └── tasks.jsonl # 49,607 task descriptions
├── data/
│ ├── chunk-000/ # Episodes 0-999
│ │ ├── episode_000000.parquet # State + actions per frame
│ │ └── ...
│ └── chunk-095/ # Episodes 95000-95599
└── videos/
├── chunk-000/
│ ├── observation.images.exterior_1_left/ # Ext cam 1 videos
│ │ ├── episode_000000.mp4
│ │ └── ...
│ └── observation.images.exterior_2_left/ # Ext cam 2 videos
└── chunk-095/Parquet columns per episode: joint_position (7), cartesian_position (6: xyz + euler), gripper_position (1), camera_extrinsics for each camera (6: xyz + euler), language_instruction, timestamp, frame_index, episode_index
Download Status
| Component | Downloaded | Total | Progress |
|---|---|---|---|
| Parquet files | 56,461 | 95,600 | 59% |
| Ext1 videos | 49,049 | 95,600 | 51% |
| Ext2 videos | 49,202 | 95,600 | 51% |
| Disk usage | 151 GB | ~321 GB | 47% |
| Complete chunks | 13 | 96 | 14% |
Download method: aria2c with direct HuggingFace URLs (16 concurrent downloads). Initial attempts used huggingface-cli but hit API rate limits (1000 req/5min). Switched to constructing URLs directly — no API calls needed. Download is resumable and running in tmux session droid_download.
Next Steps
Immediate:
- Complete download (~3-6h remaining at current rate)
- Verify data integrity: spot-check parquet/video pairs across all 96 chunks
- Compute full dataset statistics (height range, gripper distribution, frame counts)
Preprocessing for PARA training:
- DroidLocalDataset class ready at /data/cameron/para_droid_pretrain/libero/data_droid.py
- Reads parquet metadata into RAM, decodes video frames lazily via PyAV
- Projects EEF positions to pixels using estimated camera intrinsics (fy=130)
- Non-uniform resize 320x180 → 448x448 with properly scaled intrinsics
Pretraining plan:
- Train PARA on full 95K episodes, ext2 camera, skip rotation
- bs=48 on RTX 6000 Ada (23GB VRAM, ~1.6 it/s)
- 20 epochs, lr=1e-4, wandb logging
- Evaluate: does DROID-pretrained PARA transfer better to LIBERO than training from scratch?
Reproducibility
# Download dataset
python /data/cameron/droid/download_aria2.py
# Generate debug projection visualization
MUJOCO_GL=egl python droid_testing/debug_droid_projection.py \
--episode 10 --camera ext2 --out-video output.mp4
# Train on full DROID
cd /data/cameron/para_droid_pretrain/libero
MUJOCO_GL=egl CUDA_VISIBLE_DEVICES=5 \
DINO_REPO_DIR=/data/cameron/keygrip/volume_dino_tracks \
DINO_WEIGHTS_PATH=/data/cameron/keygrip/dinov3/weights/dinov3_vits16plus_pretrain_lvd1689m-4057cbaa.pth \
python train.py \
--droid --droid_data_root /data/cameron/droid --droid_camera ext2 \
--batch_size 48 --epochs 20 --lr 1e-4 \
--run_name droid_pretrain_full_ext2 \
--wandb_project para_droid --skip_rotation