Same eval scene (libero_spatial task 0, demo 0 init state). 1 episode, max_steps=600, --teleport --zero_rotation. All three models: 0% SR — every rollout hit the 600-step timeout.
Pred jerk in px/step (mean of |Delta-Delta-pos| on predicted target trajectory):
| Model | val_px_err (train) | closed-loop SR | pred jerk (px) | Notes |
|---|---|---|---|---|
| (A) 2D AR — model_autoregressive_v2 | 8.0 (3 ep, stride=5) | 0% | 1.89 | Largest jerk → policy moves the most but still in wrong direction |
| (B) Voxel + abs xyz — pilot | 12.4 (2 ep, stride=5) | 0% | 0.25 | Frozen — voxel features overpowering EEF history |
| (C) Voxel + EEF-rel xyz — pilot | 13.3 (2 ep, stride=5) | 0% | 0.06 | Most frozen of all — predictions barely change step-to-step |
Videos (same init state)
White dot = current EEF projected. Green crosshair = model predicted next-EEF cell. Gripper sign printed as g=+1 (close) / g=-1 (open).
(A) 2D AR
2D AR at val_px_err=8.0 — jerk 1.89, EEF wanders but never finds the bowl.
(B) Voxel + abs xyz
Voxel-abs pilot — jerk 0.25, essentially frozen.
(C) Voxel + EEF-relative xyz
Voxel-rel pilot — jerk 0.06, most frozen. May also be hit by the rel-PE unit-mismatch bug I flagged earlier.
Takeaways from the videos
All 3 confirm the under-training pathology — predicted target stays near current EEF cell, so action deltas are 1-2 mm and nothing happens before max_steps. The voxel models are MORE frozen than the 2D AR, consistent with them being 2 epochs vs 3 epochs and (for C) having a unit-mismatch in the geometry PE input.
Concrete next move: retrain (A) for 10+ epochs (the architecture works, the budget was too tight). Voxel models need the rel-PE fix + matching training.