# Phase A — OOD Object Position: 1view vs 2view

**Date:** 2026-05-23 → 2026-05-24
**Task:** libero_spatial, task_0 (pick black bowl, place on plate)
**Eval:** 20 positions × 5 episodes per cell = 100 episodes, `--teleport --zero_rotation --clean_scene --max_steps 600`

## Setup
- **Train splits:**
  - `default`: original 50 demos at fixed object position (existing checkpoints)
  - `exp3_left`: 128 demos covering left half of 16×16 dx/dy grid (j ∈ [0, 7])
- **Test positions (physical (dx, dy)):**
  - `left`: 20 random in-dist (j ∈ [0, 7]) — same range as train
  - `right`: 20 random OOD (j ∈ [8, 15]) — held-out right half

## Final Results

| # | Arch                    | Training        | Eval left | Eval right | OOD drop |
|---|-------------------------|-----------------|-----------|------------|----------|
| 1 | PARA (dense conv)       | exp3_left, 60min| 61/100 = 61% | 51/100 = 51% | −10pp |
| 2 | DinoVolumeQuery (1v)    | default 50 demos | 50/100 = 50% | 79/100 = 79% | (default closer to right) |
| 3 | DinoVolumeQuery2View    | default 50 demos | 35/90 = 39%* | 44/95 = 46%* | 2v hurts OOD |
| 4 | **DinoVolumeQuery (1v)**| **exp3_left, 30ep** | **86/100 = 86%** | **84/100 = 84%** | **−2pp** |
| 5 | **DinoVolumeQuery2View**| **exp3_left, 30ep** | **87/100 = 87%** | **83/100 = 83%** | **−4pp** |

\* 2 positions per cell had eval-parser failures (denominator < 100).

## Findings

**1. Architecture dominates view count.**
The factored KV volume head with EEF+CLS query (DinoVolumeQuery) is dramatically stronger than the dense conv head (PARA): +25pp on both train-dist and OOD with the same data. This effect is much larger than the 1v→2v effect.

**2. Training-data distribution matters more than wrist view.**
- Default-trained 1v: 50% (left, far OOD) / 79% (right, close to default). The model only generalizes where positions are close to its training distribution.
- Left-trained 1v: 86% / 84%. Almost no OOD drop. Position variation in training is sufficient to handle the right half.

**3. 2view does NOT clearly help OOD position when training already varies positions.**
- Left-trained 1v vs 2v: 86%/84% vs 87%/83%. Essentially tied within noise (SE ≈ 3.5pp).
- The earlier "+20pp at default position" finding does not predict OOD-position generalization once 1v gets enough position variation in training.

**4. 2view at default training actually HURTS OOD.**
Default-trained 2v: 39%/46% — worse than 1v at default. The wrist features may overfit to the gripper's default approach geometry.

## Files Produced
- `/data/libero/ood_objpos_task0_2view/` — 128 left-half demos rendered with BEV+wrist views (osmesa, 8-way parallel, ~35 min wall)
- `/data/libero/ood_objpos_splits_2view_qmlp/exp3_left_train/` — 128 symlinks in /data/cameron/para/libero naming
- `/data/cameron/para/libero/checkpoints/libero_query_v0/latest.pth` — 1v query-MLP, 30 epochs, exp3_left
- `/data/cameron/para/libero/checkpoints/libero_2view_v0/latest.pth` — 2v query-MLP, 30 epochs, exp3_left
- `/data/cameron/para/libero/eval_libero_2view_ood.py`, `eval_libero_query_ood.py` — OOD eval scripts with shift/clean/teleport flags
- `/data/cameron/para_normalized_losses/libero/generate_ood_objpos_2view.py` — 2view OOD data generator

## Original "dual_para" arm — BLOCKED
The `dual_para` architecture in `/data/cameron/para_normalized_losses/libero/` is broken at inference. Both my freshly-trained `dual_para_exp3_left_2v` AND the pre-existing `dual_para_alldemo_t0_noclip` ckpt produce 0% success across all evals. Decoder issue (`decode_dual_window_actions`), not training. Abandoned this arm for the query-MLP 2view above.

## Next
- Phase A viewpoint (1v vs 2v under shifted camera position) — pending
- Phase B (DA3 latent depth fusion) — pending
- Phase C (VLM trunk swap) — pending
