Single-scene closed-loop — 3 model variants

Same libero_spatial task 0 demo-0 init state. 1 episode, max_steps=600, --teleport --zero_rotation. All three models hit max_steps (0% SR).

Model	val_px_err	SR	pred jerk (px)	Video
(A) 2D AR	8.0 (3 ep)	0%	1.89	▶ play A
(B) Voxel + abs xyz	12.4 (2 ep pilot)	0%	0.25	▶ play B
(C) Voxel + EEF-rel xyz	13.3 (2 ep pilot)	0%	0.06	▶ play C

White dot = current EEF projected to image. Green crosshair = predicted next-EEF cell. Label shows step idx and gripper sign.

Takeaways

All 3 confirm the under-training pathology: predicted target stays in the current 8-px grid cell, action deltas are 1-2mm/step, nothing happens before max_steps. The voxel models are MORE frozen (jerk 7–30× lower) than the 2D AR — consistent with them being 2 epochs vs 3 epochs and (for C) hitting the rel-PE unit-mismatch bug I flagged.