# 567 Project Code — Augmentation for ACT Viewpoint Generalisation

This folder contains the code used to produce the experiments reported in the
final write-up. It trains an ACT policy on a LIBERO bowl-on-plate task under
three augmentation regimes (none / crop / all-aug) on multi-viewpoint
translation data, then evaluates on a 5x5 translation grid and an 8x8
rotation grid.

## File overview

```
train.py                              # Training entry point (--augment flag)
model.py, model_act.py                # Policy architectures (ACT + variants)
data.py                               # Dataset + augmentation pipeline
utils.py                              # Misc helpers

eval.py                               # Single-config rollout eval
eval_multistage.py                    # Rollout eval with miss/grasp/place scoring
eval_full_grid.py                     # 8x8 rotation grid eval
eval_translation_grid.py              # 5x5 translation grid eval
eval_translation_grid_fast.py         # 3x3 fast translation eval
eval_translation_multistage_5x5.py    # 5x5 translation grid + miss/grasp/place
eval_translation_multistage_fast.py   # 3x3 fast multistage eval

generate_ood_viewpoint.py             # Generate 640-demo rotation dataset
generate_ood_translation.py           # Generate 50-demo translation dataset
create_viewpoint_splits.py            # Build train/test viewpoint splits

viz_augmentations.py                  # Render the augmentation parameter sweep figure
gen_distribution_viz.py               # Train/test viewpoint distribution figure
gen_rollout_grid.py                   # 5x5 video grid from eval rollouts
```

## Setup

```bash
export PYTHONPATH=/path/to/LIBERO:$PYTHONPATH
export LIBERO_DATA_PATH=/path/to/libero/data
export DINO_REPO_DIR=/path/to/dinov3
export DINO_WEIGHTS_PATH=/path/to/dinov3_vits16plus_pretrain_lvd1689m-4057cbaa.pth
```

Python deps: `torch`, `torchvision`, `opencv-python`, `numpy`, `scipy`,
`h5py`, `imageio`, `wandb`, `matplotlib`, plus the LIBERO simulator
(MuJoCo + robosuite).

## Generate the multi-viewpoint translation dataset

```bash
python generate_ood_translation.py \
    --demos_per_view 10 \
    --out_root /data/libero/ood_translation_v1
```

Produces 50 demos across 5 camera positions (centre + 4 corners at
+/-10cm horizontal, +/-7.5cm vertical).

## Train the three models

The three Results-section configurations differ only in `--augment`:

```bash
# No-aug baseline
python train.py --model_type act --run_name multivp_noaug \
    --benchmark libero_spatial --task_id 0 \
    --cache_root /data/libero/ood_translation_v1 \
    --batch_size 8 --lr 1e-4 --max_minutes 30 \
    --skip_rotation --backbone resnet \
    --augment none \
    --wandb_project 567_viewpoint --wandb_mode online

# Crop only (the winner on translation OOD)
python train.py --model_type act --run_name multivp_crop \
    --benchmark libero_spatial --task_id 0 \
    --cache_root /data/libero/ood_translation_v1 \
    --batch_size 8 --lr 1e-4 --max_minutes 30 \
    --skip_rotation --backbone resnet \
    --augment crop \
    --wandb_project 567_viewpoint --wandb_mode online

# All augmentations composite
python train.py --model_type act --run_name multivp_allaug \
    --benchmark libero_spatial --task_id 0 \
    --cache_root /data/libero/ood_translation_v1 \
    --batch_size 8 --lr 1e-4 --max_minutes 30 \
    --skip_rotation --backbone resnet \
    --augment all \
    --wandb_project 567_viewpoint --wandb_mode online
```

Augmentation fires on 50% of samples by default. Checkpoints land in
`checkpoints/<run_name>/`.

## Evaluate

### Translation grid (5x5, three-stage scoring)

```bash
python eval_translation_multistage_5x5.py \
    checkpoints/multivp_crop/best.pth ms_5x5_multivp_crop 0
```

Arguments: `<checkpoint> <output_tag> <gpu_id>`. Writes
`results/ms_5x5_multivp_crop/grid_results.json` plus per-cell rollout
videos.

### Rotation grid (8x8)

```bash
python eval_full_grid.py \
    checkpoints/multivp_crop/best.pth multivp_crop_rot 0
```

Writes `results/multivp_crop_rot/grid_results.json` with per-viewpoint
success rates over the 8x8 (theta, phi) grid.

### Single-viewpoint eval (debug / spot-check)

```bash
python eval.py --model_type act \
    --checkpoint checkpoints/multivp_crop/best.pth \
    --benchmark libero_spatial --task_id 0 --n_episodes 5 \
    --teleport --zero_rotation --clean_scene --max_steps 600 \
    --shift_dx 0.0509 --shift_dy -0.2063 \
    --cam_theta 10 --cam_phi 90 \
    --out_dir eval_output/spot_check --save_video
```

## Reproducing the figures

```bash
# Augmentation parameter sweep (the row-of-augs figure)
python viz_augmentations.py

# Train/test viewpoint distribution polar plot + sample frames
python gen_distribution_viz.py

# 5x5 video grid from a completed eval run
python gen_rollout_grid.py results/ms_5x5_multivp_crop
```

## Credits

Starter code (ACT skeleton, LIBERO env wrapper, dataset cacher) was
provided by the lab. The augmentation pipeline, three-stage scoring,
camera-translation eval infrastructure, multi-viewpoint translation
dataset generator, and all experiments were implemented for this project.
External libraries: LIBERO benchmark, DINOv2 (ViT-S/16+), OpenAI CLIP,
PyTorch, OpenCV.