# vlm/ — VLM Integration

## Purpose
Integrating vision-language models with PARA for language-conditioned robot policies.

## Status
- [ ] Not yet started — placeholder for future work

## Notes
- Goal: condition PARA's pixel-aligned predictions on language instructions
- Could inject language embeddings into the DINOv3 feature space (cross-attention or feature modulation)
- Language conditioning may also help with task-level generalization across embodiments