# PARA Paper Visualization Generation Instructions

## Scripts

All scripts are in `/data/cameron/para/ood_libero/`. Run from the repo root `/data/cameron/para/`.

### Environment Setup (required for all scripts)

```bash
export PYTHONPATH=/data/cameron/LIBERO:/data/cameron/para_normalized_losses/libero:$PYTHONPATH
export DINO_REPO_DIR=/data/cameron/keygrip/dinov3
export DINO_WEIGHTS_PATH=/data/cameron/keygrip/dinov3/weights/dinov3_vits16plus_pretrain_lvd1689m-4057cbaa.pth
```

---

## 1. DINO PCA Feature Consistency Video

**Script:** `ood_libero/generate_dino_pca_video.py`

**What it shows:** Frozen DINO PCA features stay consistent as the object slides across the table (phase 1) and the camera orbits (phase 2). Side-by-side RGB | pure PCA.

**Output:**
- `dino_pca_consistency.mp4` — side-by-side RGB + PCA
- `dino_pca_only.mp4` — PCA only

```bash
python ood_libero/generate_dino_pca_video.py \
    --n_frames 90 \
    --fps 15 \
    --output_dir /data/cameron/para/.agents/reports/project_site/media/
```

**Key parameters:**
- `--n_frames`: total frames (split evenly: half object motion, half camera orbit)
- `--fps`: output framerate (15 = 6 seconds total at 90 frames)
- `--clean_scene`: removes furniture/distractors (default: True)

**How it works:**
1. Initializes LIBERO task 0 environment
2. Phase 1: slides bowl left→right (dy=-0.12 to +0.12) at default camera
3. Phase 2: freezes object, orbits camera (phi=-35° to +35°, theta=3° to 18°)
4. Extracts frozen DINO ViT-S/16 features at each frame
5. Computes joint PCA across ALL frames (so colors are globally consistent)
6. Renders side-by-side video

---

## 2. ACT vs PARA Comparison Rollouts

**Script:** `ood_libero/generate_act_vs_para_comparison.sh`

**What it shows:** ACT succeeds in-distribution but fails at OOD position and OOD viewpoint. PARA succeeds at both.

```bash
bash ood_libero/generate_act_vs_para_comparison.sh
```

**Output:** Individual rollout videos in `ood_libero/comparison_video_clips_v2/`

**Conditions tested:**
- In-dist position (dx=-0.08, dy=0.0) — ACT should succeed
- OOD position (dx=-0.08, dy=0.18) — ACT fails, PARA succeeds
- OOD viewpoint (theta=18°, phi=30°) — ACT fails, PARA succeeds

**Checkpoints used:**
- ACT: `/data/cameron/para_normalized_losses/libero/checkpoints/act_v2_exp4_n64/best.pth`
- PARA: `/data/cameron/para_normalized_losses/libero/checkpoints/para_v2_exp4_n64/best.pth`

### Stitched Comparison Video

**Script:** `ood_libero/stitch_comparison_video.py`

**What it shows:** Stitches the individual clips into a single video with transition cards and success/failure badges.

```bash
python ood_libero/stitch_comparison_video.py
```

**Output:** `act_vs_para_comparison.mp4` in project site media.

---

## 3. Feature PCA Comparison (2x2 Grid Video)

**Script:** `ood_libero/generate_feature_comparison.py`

**What it shows:** 2x2 grid — ACT rollout + ACT DINO PCA | PARA rollout + PARA DINO PCA. Both at OOD position. Shows features look similar but rollout outcomes differ.

```bash
python ood_libero/generate_feature_comparison.py \
    --max_steps 600 \
    --fps 4 \
    --episode_idx 4 \
    --shift_dy 0.18
```

**Output:**
- `feature_pca_comparison.mp4` — 2x2 grid video
- `attention_map_grid.png` — self-attention heads from both models

**Key parameters:**
- `--shift_dy`: object position shift (0.18 = moderate OOD, outside N=64 training grid)
- `--episode_idx`: which demo init state to use (4 gave PARA success)
- `--max_steps`: max environment steps (600 for teleport mode)
- `--fps`: output framerate (4 = one frame per replan step)

**How it works:**
1. Loads both ACT and PARA checkpoints (N=64 models)
2. Runs ACT rollout with teleport + zero_rotation servo execution
3. Resets env, runs PARA rollout with same settings
4. Extracts DINO backbone patch features at each replan step from both models
5. Computes joint PCA across ALL features from BOTH runs (consistent colors)
6. Composites 2x2 grid video: [ACT RGB, ACT PCA; PARA RGB, PARA PCA]
7. Also generates attention map grid (see below)

**Teleport servo execution:**
- For each predicted 3D target: servo with max 25 steps, 5mm threshold, zero rotation
- After reaching target, apply gripper command in separate step
- This matches the eval protocol used for all reported results

---

## 4. Attention Map Grid (PNG)

Generated automatically by `generate_feature_comparison.py` (see above).

**What it shows:** CLS-to-patch self-attention from all 6 heads of the last DINO block, for both ACT and PARA backbones on the same OOD frame.

**Layout:** `RGB | ACT head 1..6 | PARA head 1..6`

**Details:**
- DINO ViT-S/16 has 6 attention heads (384 embed dim / 64 head dim)
- 12 transformer blocks total; we show attention from the **last block** only
- Each heatmap shows CLS token attention weights to all 28x28 patch tokens
- Upsampled to 448x448 with INFERNO colormap
- CLS-to-patch attention = "what spatial locations does the CLS token attend to?"
  - For ACT: this is directly what the action MLP sees (CLS → MLP → coordinates)
  - For PARA: the CLS isn't used for actions (PARA uses spatial features), but shows backbone fine-tuning differences

**To generate standalone (without running full rollouts):**
The function `generate_attention_map_grid()` in the script can be called independently if you extract the first frame setup.

---

## Output File Locations

All outputs go to `/data/cameron/para/.agents/reports/project_site/media/`:

| File | Description |
|---|---|
| `dino_pca_consistency.mp4` | RGB + DINO PCA side-by-side, object motion then camera orbit |
| `dino_pca_only.mp4` | PCA-only version |
| `act_vs_para_comparison.mp4` | Stitched ACT fail / PARA succeed comparison with transitions |
| `feature_pca_comparison.mp4` | 2x2 grid: ACT vs PARA rollout + feature PCA |
| `attention_map_grid.png` | Self-attention heads from both models |

All videos are H.264 encoded for web playback.

Viewable at: `https://omidlab.net/para_website/media/<filename>`

---

## Model Details

| | ACT | PARA |
|---|---|---|
| Backbone | DINOv2 ViT-S/16 (384 dim, 6 heads, 12 blocks) | Same |
| Action head | CLS token → MLP → (x,y,z) coordinates | Spatial features → 1x1 conv → pixel heatmap + height bins |
| N_WINDOW | 4 timesteps | 4 timesteps |
| Checkpoint | `act_v2_exp4_n64/best.pth` | `para_v2_exp4_n64/best.pth` |
| Training | 64 positions, default viewpoint, 10 min | Same |