# Phase A — OOD Viewpoint: 1view vs 2view

**Date:** 2026-05-24
**Task:** libero_spatial t0 (pick black bowl, place on plate)
**Eval:** 10 episodes per cell, `--teleport --zero_rotation --clean_scene --max_steps 600`

## Setup
- **Training data:** `vp_train` (400 demos): phi ∈ {0°, 45°, 90°, 270°, 315°} × thetas ∈ {0, 3.6, 7.1, 10.7, 14.3, 17.9, 21.4, 25}°, varied object position (dx ∈ [-0.4, -0.01], dy ∈ [-0.3, 0.3] from default).
- **Model fix:** added `build_bev_world_xyz_table_batched` in `model_dino_volume_query_2view.py` so each demo gets its own BEV→world XYZ table (BEV camera differs per demo).
- **Eval-time:** `--cam_theta`/`--cam_phi` reposition agentview before episode start; eval rebuilds bev_xyz_table from actual env's BEV params after reposition.
- **Training:** 20 epochs, `lr=5e-5`, `batch_size=8`, `--vis_every_steps 100000`. Final losses: 1v vol=0.156, 2v vol=0.025.

## Results

| Model | Viewpoint (θ°, φ°) | Eval label | Success (n=10) | avg_steps |
|---|---|---|---|---|
| **1v query-MLP** | (0, 0) — default     | in-dist default | 10/10 = **100%** | 124 |
| **1v query-MLP** | (14, 45)             | in-dist left    | 7/10 = **70%**   | 265 |
| **1v query-MLP** | (10, 180)            | OOD back        | 10/10 = **100%** | 114 |
| **1v query-MLP** | (14, 225)            | OOD back-left   | 8/10 = **80%**   | 222 |
| **2v query-MLP** | (0, 0) — default     | in-dist default | 6/10 = **60%**   | 315 |
| **2v query-MLP** | (14, 45)             | in-dist left    | hung (killed at episode 3+) | — |
| **2v query-MLP** | (10, 180)            | OOD back        | 5/10 = **50%**   | 367 |
| **2v query-MLP** | (14, 225)            | OOD back-left   | 8/10 = **80%**   | 225 |

**Aggregate (excluding hung cell):**
- 1v avg: (100 + 70 + 100 + 80) / 4 = **88%**
- 2v avg: (60 + 50 + 80) / 3 = **63%**
- **2v is ~25pp WORSE than 1v across viewpoints, including in-dist default.**

## Finding

**The Phase A objpos pattern holds — wrist view does not help, in fact hurts.** Across object position (Phase A part 1) and viewpoint (Phase A part 2), the same conclusion: when 1v is trained on enough environmental variation (positions or viewpoints), adding the wrist view does not improve OOD generalization and often degrades performance.

**Why 2v hurts here, not just ties (vs objpos where 2v ≈ 1v):**
- For object position: 2v ≈ 1v (both ~85%) — wrist neither helps nor hurts.
- For viewpoint: 2v < 1v (~63% vs ~88%) — wrist actively hurts.
- Hypothesis: the per-demo varying BEV camera makes the BEV-→wrist projection step in the 2v fusion much noisier. The wrist features get projected into BEV space using bev_xyz_table that's now per-batch and per-demo, so the cross-view alignment is harder to learn. The 2v's wrist-projection adds noise without compensating signal because the wrist view itself (gripper-anchored) shows the same things as BEV would, just from a different angle.

**Caveats:**
- Only 10 episodes per cell (SE ≈ 11pp) — small sample, take with grain of salt.
- The hung 2v cell at (14°, 45°) suggests inference may be slow/divergent in some configurations — likely the cross-view projection generates degenerate scores.
- I only tested 4 (theta, phi) pairs — denser eval would be more thorough.

## Bottom line for Cameron

**Both Phase A objpos and Phase A viewpoint show 2view doesn't help OOD generalization once 1view has enough environmental variation in training.** The wrist-anchored frame's theoretical advantage (invariant to BEV shifts) doesn't translate to better OOD success — possibly because BEV alone, when trained on diverse scenes, already learns to handle the variation, and the wrist-projection fusion introduces noise.

This may be specific to the current query-MLP projection-based fusion. Alternative 2view architectures (e.g. independent per-view heads + voting, or a learned modal-router) might behave differently. But for the current architecture, 2view is not a clear win.

## Files
- `/data/libero/ood_viewpoint_v3_2view/` — 400 rendered demos
- `/data/libero/ood_viewpoint_v3_splits_2view_qmlp/vp_train/` — train split (2v naming)
- `/data/libero/ood_viewpoint_v3_splits_2view_1v/vp_train/` — train split (1v naming)
- `/data/cameron/para/libero/checkpoints/libero_query_v0/latest.pth` — 1v ckpt (overwrote prior objpos run)
- `/data/cameron/para/libero/checkpoints/libero_2view_v0/latest.pth` — 2v ckpt
- `/data/cameron/para/libero/eval_libero_2view_ood.py`, `eval_libero_query_ood.py` — eval with `--cam_theta`/`--cam_phi`
- `/data/cameron/para_normalized_losses/libero/generate_ood_viewpoint_2view.py` — generator