# DROID + UWM setup (vidgen)

## Done

- **Repo**: Cloned at `/data/cameron/vidgen/unified-world-model`
- **Conda env**: `uwm` (Python 3.10) with `pip install -e .` and deps from `requirements.txt`
- **Launch script**: `scripts/launch_droid_pretrain.sh` updated for a minimal run and your paths
- **Package fix**: Added `datasets/__init__.py` and `datasets/utils/__init__.py` so the repo’s `datasets` is used instead of HuggingFace’s

## Activate and run

```bash
conda activate uwm
cd /data/cameron/vidgen/unified-world-model
bash scripts/launch_droid_pretrain.sh
```

The script sets `PYTHONPATH` to the repo root, then (if the buffer is missing) builds a Zarr buffer from DROID TFDS and runs UWM training.

## DROID data source

The script uses **DROID in TFDS format**:

- **Default**: `DATA_DIR=gs://gresearch/robotics` (official bucket). Needs `gcloud auth` and network.
- **Your own data**: If you have DROID as TFDS under a directory (e.g. after building with [droid_dataset_builder](https://github.com/kpertsch/droid_dataset_builder)), set:
  ```bash
  export DROID_DATA_DIR=/path/to/tfds/droid
  bash scripts/launch_droid_pretrain.sh
  ```
- **Raw DROID only** (e.g. `/data/weiduoyuan/droid_raw/1.0.1`): You must build the TFDS dataset first and point `DROID_DATA_DIR` at the resulting TFDS data dir. The UWM converter does not read raw MP4s directly.

## Checkpoints

DROID/LIBERO checkpoints are on Google Drive:  
https://drive.google.com/drive/folders/1M4AuVLMRpSwOf_YAp56bV9AqyZI9ul6g

To download into the repo:

```bash
conda activate uwm
pip install gdown
bash scripts/download_checkpoints.sh
```

Or download manually and put files in `unified-world-model/checkpoints/`. Use `pretrain_checkpoint_path` in the training config to finetune from a checkpoint.

## Image-to-video inference (minimal)

Run the pretrained UWM on a single start frame to sample the next observation (short video) and action:

```bash
conda activate uwm
cd /data/cameron/vidgen/unified-world-model
export PYTHONPATH=$(pwd)

# With a pretrained checkpoint (download from the Drive link above, e.g. models.pt into checkpoints/)
python experiments/uwm/run_image_to_video.py \
  --checkpoint checkpoints/models.pt \
  --image /path/to/start_frame.png \
  --output uwm_out.mp4

# Without --image: uses a random dummy image (for testing)
python experiments/uwm/run_image_to_video.py --checkpoint checkpoints/models.pt --output out.mp4
```

The script loads the DROID-configured UWM, builds a 3-view observation from your single image (same frame repeated for the 3 cameras and 2 timesteps), runs `sample_joint`, decodes the next-obs latent to RGB, and saves the result as MP4 (or as a frame folder if PyAV is missing). Use **pretrained** `models.pt` from the Drive folder for meaningful video; the repo’s `checkpoints/dummy_uwm.pt` is random weights for testing the pipeline only.

## Minimal run

The script is set for a small run: 50 episodes for the buffer and `exp_id=minimal`. To change:

- Edit `scripts/launch_droid_pretrain.sh`: `--num_episodes`, `exp_id`, or pass Hydra overrides, e.g.  
  `dataset.buffer_path=$BUFFER_PATH num_steps=1000`