"""
UVA-style video prediction on the Droid dataset.

This package implements a simplified version of the Unified Video Action (UVA)
video model for pure video prediction:

- Uses a MAR-style VAE tokenizer to map RGB frames to latent tokens.
- Uses a transformer over latent tokens to predict future frames.
- Always masks *all* future frames during training (no flexible masking).
- Does not model actions or use diffusion heads – just deterministic regression.

See `uva/train.py` for a minimal training script.
"""