# ood_libero/ — Out-of-Distribution Generalization Study

## Goal
Study viewpoint and object-position generalization of PARA vs ACT on LIBERO. We generate OOD variants of the LIBERO dataset by (1) rendering from novel camera viewpoints and (2) shifting pick/place object positions, then retargeting the demonstration trajectories accordingly.

## Scripts

| File | Purpose |
|---|---|
| `viewpoint_distribution.py` | Sample camera viewpoints on a spherical cap and render frame 0 |
| `object_removal_test.py` | Remove distractors/furniture + render grid of shifted object positions |
| `replay_shifted_trajectory.py` | Replay a single demo trajectory with shifted objects via servo teleport (shifts entire trajectory — robot starts offset) |
| `replay_natural_start.py` | **Preferred.** Natural-start replay: robot starts at home, interpolates to shifted pre-grasp, then executes grasp/lift/place |
| `object_position_demos.py` | 3x3 grid video of full trajectory replays at different object positions |
| `debug_execution.py` | Debug a single shifted trajectory with tunable z_offset |
| `overlay_first_frames.py` | Overlay all demo first frames to visualize object position variance |

## State Array Layout

LIBERO state arrays (from HDF5 `data/demo_*/states`) have a **1-element prefix** before qpos:
```
state[0]     = extra scalar
state[1:49]  = qpos[0:48]
state[49:92] = qvel[0:43]
```
**All state index references must add +1 to the qpos index.** For example, bowl qpos[9] = state[10].

### Task 0 object layout
| Object | qpos slice | state slice | DOF (qvel) | Role |
|---|---|---|---|---|
| `akita_black_bowl_1` | `qpos[9:16]` | `state[10:17]` | `qvel[9:15]` | **pick** |
| `akita_black_bowl_2` | `qpos[16:23]` | `state[17:24]` | `qvel[15:21]` | distractor |
| `cookies_1` | `qpos[23:30]` | `state[24:31]` | `qvel[21:27]` | distractor |
| `glazed_rim_porcelain_ramekin_1` | `qpos[30:37]` | `state[31:38]` | `qvel[27:33]` | distractor |
| `plate_1` | `qpos[37:44]` | `state[38:45]` | `qvel[33:39]` | **place** |

Each free joint is `[x, y, z, qw, qx, qy, qz]` in qpos (7 values) and 6 DOF in qvel.

### Fixed furniture (not free joints — moved via `sim.model.body_pos`)
- `wooden_cabinet_1_main` — moved to `(0, 0, -5)` underground
- `flat_stove_1_main` — moved to `(0, 0, -5)` underground

---

## OOD Viewpoint Dataset Generation

### How it works
1. Extracts the default `agentview` camera position and forward direction
2. Computes the look-at point by tracing the camera ray to the table plane (z=0.85)
3. Samples `n_views` uniformly-spaced points on a spherical cap of half-angle `theta_max` centered on the default camera direction (Fibonacci method)
4. For each viewpoint, sets `sim.model.cam_pos` / `sim.model.cam_quat`, renders, and restores

### Recommended parameters
| Parameter | Value | Notes |
|---|---|---|
| `theta_max` | 10–30 degrees | 10 = subtle perturbations, 30 = significant viewpoint diversity |
| `n_views` | 16 | Good coverage for a spherical cap |
| `image_size` | 256 (preview) or 448 (training) | |

### Generate viewpoint samples
```bash
# Preview: 16 views, 30-degree cap
python ood_libero/viewpoint_distribution.py \
    --n_views 16 --theta_max 30 --image_size 256

# Tighter: 10-degree cap
python ood_libero/viewpoint_distribution.py \
    --n_views 16 --theta_max 10 --image_size 256
```

### All options
```
--n_views      Number of viewpoints to sample (default: 16)
--theta_max    Max angle in degrees from default view (default: 30)
--image_size   Render resolution (default: 256)
--benchmark    LIBERO benchmark name (default: libero_spatial)
--task_id      Task index (default: 0)
--demo_id      Demo index (default: 0)
--camera       Camera name (default: agentview)
--table_z      Table height for look-at computation (default: 0.85)
--out_dir      Output directory (default: ood_libero/out/)
```

### Outputs
```
ood_libero/out/
  viewpoints_3d.png      — 3D plot of camera positions on the spherical cap
  renders_grid.png       — grid of rendered images (view_0 default + sampled)
  renders/view_*.png     — individual rendered frames
  viewpoint_meta.npz     — positions, quaternions, look-at, radius for downstream use
```

### Camera modification mechanism
MuJoCo camera parameters are writable numpy arrays on the compiled model:
```python
sim.model.cam_pos[cam_id] = new_pos    # (3,) world position
sim.model.cam_quat[cam_id] = new_quat  # (4,) quaternion (w, x, y, z)
sim.forward()                           # recompute derived quantities
```
The `agentview` camera (cam_id=2) is attached to body 0 (world), so it's a static camera that can be freely repositioned. Camera matrices (intrinsics, extrinsics, world-to-cam) must be recomputed after repositioning.

---

## OOD Object Position Dataset Generation

### Overview
1. **Remove distractors** — make non-essential objects invisible (alpha=0) and move them off-screen in the state, then freeze their qpos/qvel each step to prevent physics instability
2. **Remove furniture** — move cabinet and stove underground via `sim.model.body_pos`
3. **Center pick/place objects** — shift bowl to world origin `(0, 0)` so the bowl is centered in the scene / aligned with the robot base. The plate maintains its relative offset.
4. **Apply grid of (dx, dy) shifts** — move both pick and place objects together by each offset

### Recommended parameters
| Parameter | Value | Notes |
|---|---|---|
| `dx` range | `[-0.15, 0.0]` | World X: negative = toward robot, 0 = centered. `dx > 0` is outside workspace for zero-rotation grasps. |
| `dy` range | `[-0.1, +0.1]` | World Y: horizontal in camera view. Symmetric. |
| `z_offset` | `-0.015` | **Always use.** Lowers EEF target by 15mm during grasp phase. Without this, grasps are marginal and often fail. |
| `image_size` | 256 (video grid) or 448 (training) | |
| `frame_stride` | 3 | Every 3rd demo frame as a servo waypoint |
| `max_servo` | 50 | Max OSC steps per waypoint |

### Distractor handling (critical for physics stability)
When running trajectories with `env.step()`, distractor objects must NOT be moved underground (causes infinite free-fall → NaN physics). Instead use three-pronged approach:
1. **Invisible**: set geom `rgba` alpha to 0 via `sim.model.geom_rgba[geom_id][3] = 0.0`
2. **Off-screen**: move to `(10, 10, 0.9)` in the state array
3. **Frozen**: after every `env.step()`, reset their qpos position and zero their qvel

For static renders (no `env.step()`), moving objects underground `(0, 0, -5)` is fine.

### Render static grid of object positions (frame 0 only)
```bash
python ood_libero/object_removal_test.py \
    --image_size 256 --n_shifts 5 --shift_range 0.1
```
Outputs `object_removal.png` (with/without distractors) and `object_positions.png` (5x5 grid).

### Generate 3x3 grid video of retargeted trajectories
```bash
python ood_libero/object_position_demos.py \
    --image_size 256 --frame_stride 3 --fps 10 \
    --shift_range 0.1 --dx_min -0.15 --dx_max 0.0 \
    --z_offset -0.015
```
This runs 9 full trajectory replays (one per grid cell) and tiles them into a single video.

### All options for `object_position_demos.py`
```
--image_size    Per-cell render resolution (default: 256)
--frame_stride  Use every Nth demo frame as waypoint (default: 3)
--max_servo     Max OSC steps per waypoint (default: 50)
--shift_range   Symmetric range for dy; also dx if dx_min/dx_max not set (default: 0.1)
--dx_min        Min dx, toward robot (default: -shift_range)
--dx_max        Max dx, away from robot (default: +shift_range)
--z_offset      Lower EEF during grasp phase (default: -0.015). Always use.
--fps           Video frame rate (default: 10)
--out_dir       Output directory (default: ood_libero/out/)
```

### Outputs
```
ood_libero/out/
  object_position_demos.mp4          — 3x3 tiled video of all trajectories
  object_position_demos_preview.png  — first frame of the grid video
```

---

## Trajectory Retargeting (Servo Teleport)

### How it works
The original LIBERO demos store robot joint angles, not Cartesian EEF positions. To replay a demo with shifted objects, we extract the EEF trajectory, shift it, then servo the robot to each shifted waypoint.

### Two approaches

**1. `replay_shifted_trajectory.py` — Whole-trajectory shift (legacy)**
Shifts every EEF waypoint by (dx, dy). Simple but the robot starts at a weird offset position since the starting EEF is also shifted.

**2. `replay_natural_start.py` — Natural start (preferred)**
Robot starts at its natural home pose, then:
1. **Extract** the original EEF trajectory from demo states
2. **Find grasp point** — first timestep where gripper transitions from open (-1) to close (+1). For task 0 demo 0 this is t=36.
3. **Pre-grasp** — a few timesteps before grasp (default: 6 steps before, so t=30)
4. **Phase 1 (approach)**: linearly interpolate from the robot's home EEF position to the shifted pre-grasp position (default: 8 interpolated waypoints). Gripper stays open.
5. **Phase 2 (execute)**: servo through the shifted trajectory from pre-grasp onward, applying gripper actions from the demo.
6. **z_offset** — when gripper is closing, lower EEF target by -0.015m for a secure grasp.

### Run the preferred natural-start replay
```bash
python ood_libero/replay_natural_start.py \
    --dx -0.075 --dy 0.0 \
    --frame_stride 3 --fps 10 --image_size 448
```

### All options for `replay_natural_start.py`
```
--dx              X shift from centered position (default: 0.0)
--dy              Y shift from centered position (default: 0.0)
--z_offset        Lower EEF during grasp (default: -0.015). Always use.
--pregrasp_lead   Timesteps before grasp to start shifted approach (default: 6)
--interp_steps    Interpolated waypoints from home to pre-grasp (default: 8)
--frame_stride    Use every Nth demo frame as waypoint (default: 3)
--max_servo       Max OSC steps per waypoint (default: 50)
--image_size      Render resolution (default: 448)
--fps             Video frame rate (default: 10)
```

### Run legacy whole-trajectory shift
```bash
python ood_libero/replay_shifted_trajectory.py \
    --image_size 448 --frame_stride 3 --fps 10
```

### Debug a specific shift
```bash
python ood_libero/debug_execution.py \
    --dx -0.15 --dy 0.0 --z_offset -0.015 \
    --frame_stride 3 --fps 10
```

### Important implementation details
- **Horizon**: robosuite terminates episodes after a fixed number of steps. The scripts set `env.env.horizon = 100000` to prevent premature termination during servo replay.
- **Done flag**: between trajectory replays, call `env.reset()` to clear the done flag, then re-apply `hide_furniture()` and `hide_distractors_visual()` since reset recreates the sim.
- **Timestep reset**: set `env.env.timestep = 0` and `env.env.done = False` at the start of each replay.
- **Gripper transition**: for task 0, gripper closes at t=36 and opens at t=85. The `find_grasp_timestep()` function detects this automatically from the actions array.

---

## Visualizing Object Position Variance

### Overlay first frames from all demos
```bash
python ood_libero/overlay_first_frames.py
```
Loads `000000.png` from each demo in the parsed dataset (`/data/libero/parsed_libero/libero_spatial/task_0/demo_*/frames/`), computes the mean image and per-pixel variance, and saves:
- `first_frame_overlay.png` — 3-panel: mean, variance heatmap, overlay
- `first_frame_mean.png` — just the mean
- `first_frame_overlay_only.png` — variance heatmap blended on mean

---

## Environment Notes

- **Conda env:** `conda activate /data2/cameron/miniconda3/envs/torch124` (or `uva`)
- **LIBERO path:** set `LIBERO_DATA_PATH=/data/libero` or it resolves via `~/.libero/`
- **MuJoCo:** uses the `mujoco` (DeepMind) bindings via robosuite 1.4.1
- **LIBERO source:** `/data/cameron/LIBERO` (added to `sys.path` in scripts)
- **Parsed dataset:** `/data/libero/parsed_libero/libero_spatial/task_0/demo_*/`

## Current Status
- [x] Viewpoint sampling on spherical cap + rendering
- [x] Object removal (distractors + furniture)
- [x] Object position shifting with grid of (dx, dy) offsets
- [x] Trajectory retargeting via servo teleport with z_offset
- [x] 3x3 grid video of retargeted trajectories
- [x] First-frame variance overlay visualization
- [ ] Full dataset generation (all demos × all viewpoints/positions)
- [ ] PARA vs ACT evaluation under viewpoint/position shift