(a) Two-Stage Training
Video UNet
4K pretrain steps
Joint Fine-tune
3K joint steps
PARA Heatmap Head
Video Generation
(b) Rollout Comparison
SVD + PARA — 92% same backbone · pixel-aligned head
SVD + Global Regression — 0% same backbone · CLS → MLP head