Single-Scene Closed-Loop — 3 model variants (v2 fixed videos)

Same eval scene (libero_spatial task 0, demo 0 init state). 1 episode, max_steps=600, --teleport --zero_rotation. All three models: 0% SR — every rollout hit the 600-step timeout.

Pred jerk in px/step (mean of |Delta-Delta-pos| on predicted target trajectory):

Model	val_px_err (train)	closed-loop SR	pred jerk (px)	Notes
(A) 2D AR — model_autoregressive_v2	8.0 (3 ep, stride=5)	0%	1.89	Largest jerk → policy moves the most but still in wrong direction
(B) Voxel + abs xyz — pilot	12.4 (2 ep, stride=5)	0%	0.25	Frozen — voxel features overpowering EEF history
(C) Voxel + EEF-rel xyz — pilot	13.3 (2 ep, stride=5)	0%	0.06	Most frozen of all — predictions barely change step-to-step

Videos (same init state)

White dot = current EEF projected. Green crosshair = model predicted next-EEF cell. Gripper sign printed as g=+1 (close) / g=-1 (open).

(A) 2D AR

2D AR at val_px_err=8.0 — jerk 1.89, EEF wanders but never finds the bowl.

(B) Voxel + abs xyz

Voxel-abs pilot — jerk 0.25, essentially frozen.

(C) Voxel + EEF-relative xyz

Voxel-rel pilot — jerk 0.06, most frozen. May also be hit by the rel-PE unit-mismatch bug I flagged earlier.

Takeaways from the videos

All 3 confirm the under-training pathology — predicted target stays near current EEF cell, so action deltas are 1-2 mm and nothing happens before max_steps. The voxel models are MORE frozen than the 2D AR, consistent with them being 2 epochs vs 3 epochs and (for C) having a unit-mismatch in the geometry PE input.