# Ctrl-World setup (vidgen clone)

## Done for you

- **Repo:** Cloned to `/data/cameron/vidgen/Ctrl-World`
- **Conda env:** `ctrl-world` (Python 3.11, `pip install -r requirements.txt`)
- **Checkpoints** (in `Ctrl-World/checkpoints/`):
  - `stable-video-diffusion-img2vid` – SVD
  - `clip-vit-base-patch32` – CLIP
  - `ctrl-world/checkpoint-10000.pt` – Ctrl-World DROID ckpt

## Run minimal replay (bundled droid_subset)

Activate env and run on a **free GPU** (needs ~12GB+ free):

```bash
cd /data/cameron/vidgen/Ctrl-World
conda activate ctrl-world

# Use a GPU with enough free memory (e.g. 7 if 0/9 are busy)
export CUDA_VISIBLE_DEVICES=7
bash run_replay_example.sh
```

Or call the script directly:

```bash
CUDA_VISIBLE_DEVICES=7 python scripts/rollout_replay_traj.py \
  --dataset_root_path dataset_example \
  --dataset_meta_info_path dataset_meta_info \
  --dataset_names droid_subset \
  --data_stat_path dataset_meta_info/droid_subset/stat.json \
  --svd_model_path "$(pwd)/checkpoints/stable-video-diffusion-img2vid" \
  --clip_model_path "$(pwd)/checkpoints/clip-vit-base-patch32" \
  --ckpt_path "$(pwd)/checkpoints/ctrl-world/checkpoint-10000.pt"
```

Output videos are written under `synthetic_traj/Rollouts_replay/video/`.

## Using your DROID data at `/data/weiduoyuan/droid`

Ctrl-World expects **preprocessed** data: video latents extracted with their SVD VAE, plus the same folder/annotation layout as `dataset_example/droid_subset`.

1. **Extract latents** (needs HuggingFace DROID layout: `meta/episodes.jsonl`, `data/chunk-XXX/`, `videos/chunk-XXX/.../`). If your `/data/weiduoyuan/droid` matches that layout:

   ```bash
   accelerate launch dataset_example/extract_latent.py \
     --droid_hf_path /data/weiduoyuan/droid \
     --droid_output_path /data/cameron/vidgen/Ctrl-World/dataset_example/droid_local \
     --svd_path "$(pwd)/checkpoints/stable-video-diffusion-img2vid"
   ```

2. **Build meta info:**

   ```bash
   python dataset_meta_info/create_meta_info.py \
     --droid_output_path /data/cameron/vidgen/Ctrl-World/dataset_example/droid_local \
     --dataset_name droid_local
   ```

3. **Run replay** with `--dataset_names droid_local` and `--data_stat_path dataset_meta_info/droid_local/stat.json`.

If your DROID dir has a different structure (e.g. only raw videos), you’ll need to adapt `extract_latent.py` or the dataset loader to match it.