# video_training/ — Video Pretraining Pipeline

## Purpose
Large-scale pretraining on robot video data to improve PARA's visual representations before task-specific BC fine-tuning.

## Status
- [ ] Not yet started — placeholder for future work

## Notes
- Key idea: pretrain pixel-aligned representations on unlabeled robot video using point tracks (e.g. via CoTracker, TAPIR) as self-supervision
- Point tracks provide dense 2D correspondence signal — natural supervision for pixel-aligned heatmap predictions
- Could pretrain the volume head to predict future point locations before introducing the height/3D component