# Concat & Dual-Frustum 2view experiments

Both new variants trained 20 epochs on `vp_train` (400 demos, viewpoint varied), eval at the same 4 viewpoints as Phase A.

## Results summary

| Cell | 1v | sum 2v | max 2v | concat | dualfrustum |
|---|---|---|---|---|---|
| (0, 0) default | 100% | 60% | 90% | 60% | 70% |
| (14, 45) in-dist left | 70% | hung | 80% | 60% | 50% |
| (10, 180) OOD back | 100% | 50% | 70% | 70% | 60% |
| (14, 225) OOD back-left | 80% | 80% | 50% | **90%** | **90%** |
| **average** | **87.5%** | 63%* | 72.5% | 70% | 67.5% |

\* sum 2v hung at (14, 45)

## Architectures

- **concat**: BEV and wrist images side-by-side as (3, 448, 896). DINO's self-attention crosses both views natively. Volume head outputs over BEV-half features only.
- **dualfrustum**: two volumes of world points — BEV-anchored (standard) + wrist-anchored (new). Each scored against both views' features via projection. Volume logits stacked along anchor axis (B, T, Z, 2, H, W); CE loss flattens; at inference, argmax determines which anchor to use for 3D recovery.

## Findings

1. **Both new variants beat all others at (14, 225) OOD back-left**: 90% vs 1v 80%, max-fusion 50%, sum 2v 80%. This is the viewpoint where BEV is most rotated from training — and the wrist anchor (gripper-fixed) provides a stable reference. The +10pp wins suggest the wrist DOES add value when BEV is heavily perturbed.

2. **They lose 30-40pp on default + (14, 45) viewpoints**. The new architectures add complexity (more capacity to fit, more ways to make mistakes), and in default/in-dist scenes where 1v already gets 100/70%, that complexity isn't repaid.

3. **No 2view architecture so far beats 1v on average for libero_spatial t0.** Concat at 70%, dualfrustum at 67.5%, max-fusion 72.5%, vs 1v at 87.5%.

4. **The "win at back-left" is consistent across architectures** — concat and dualfrustum are arch-different but both gain there. Suggests the wrist view has a true geometric advantage at large BEV rotations, which different fusions can exploit, but it's narrow.

## Implication

For libero_spatial t0, the BEV is already a great viewpoint for the task. The wrist only meaningfully helps in specific OOD rotations. Whether to ship 2v depends on the deployment regime — if expected to operate at large viewpoint perturbations, the 90% at back-left matters. If operating near default, 1v is strictly better.

For paper figures, "2view helps OOD-far viewpoints" might still be a publishable claim — limited to specific viewpoint regimes. The negative finding on average is also interesting.

## Files

- `/data/cameron/para/libero/checkpoints/libero_concat_v0/latest.pth`
- `/data/cameron/para/libero/checkpoints/libero_dualfrustum_v0/latest.pth`
- `/data/cameron/para/libero/model_dino_volume_query_concat.py`
- `/data/cameron/para/libero/model_dino_volume_query_dualfrustum.py`
- `/data/cameron/para/libero/train_libero_concat.py`, `train_libero_dualfrustum.py`
- `/data/cameron/para/libero/eval_libero_concat_ood.py`, `eval_libero_dualfrustum_ood.py`
- `/data/cameron/para/libero/logs/concat_dualfrustum_eval/` — per-cell logs
