[![License](https://img.shields.io/github/license/tri-ml/vla_foundry?color=blue)](https://github.com/tri-ml/vla_foundry/blob/main/LICENSE) [![Release](https://img.shields.io/github/v/release/tri-ml/vla_foundry?color=green)](https://github.com/tri-ml/vla_foundry/releases) [![GitHub Repo stars](https://img.shields.io/github/stars/tri-ml/vla_foundry?color=yellow)](https://github.com/tri-ml/vla_foundry/stargazers) [![GitHub contributors](https://img.shields.io/github/contributors/tri-ml/vla_foundry?color=orange)](https://github.com/tri-ml/vla_foundry/graphs/contributors) # VLA Foundry VLA Foundry is a framework for training Vision-Language-Action models. We support the following: - **Multiple modalities**: Train a model with text, image-captions, or robotics data. With VLA Foundry, you can train an LLM, then use the checkpoint to train a VLM, then use the checkpoint to train a VLA -- all at one place without any external dependencies. - **Multi-node training**: VLA Foundry supports [FSDP2](https://docs.pytorch.org/tutorials/intermediate/FSDP_tutorial.html) and streams datasets with [WebDatasets](https://github.com/webdataset/webdataset). Multi-GPU training works well locally with `torchrun` and on large clusters with AWS SageMaker. - **Dataset mixing**: Dataset sources and ratios can be specified during dataloading time, allowing for easy dataset mixing and batch balancing. - **Modular and maintainable design**: VLA Foundry is built for flexibility and ease of development. Most modules are implemented with pure PyTorch, without any external libraries. This makes it easier to modify the training pipeline and add new features. - **Hugging Face support**: Modules can either be loaded using the native PyTorch implementation, or loaded using pre-trained weights from Hugging Face. This allows users to develop on top of state-of-the-art model releases for LLMs, VLMs, CLIP models, etc. ## Contents - [Installation](#installation) - [Contributing Guidelines](#contributing-guidelines) - [Quickstart](#quickstart) - [Running on SageMaker](#running-on-sagemaker) - [Troubleshooting FAQ](#troubleshooting-faq) - [Deployment Examples](#deployment-examples) - [Repo Structure and Implementation](#repo-structure-and-implementation)

Param/Argument Structure
Data
Dataloading Pipeline
Model Saving / Loading
Training
Logging
Linting
Tests

- [Citation](#citation) - [Acknowledgements](#acknowledgements) ## Installation We recommend using [uv](https://docs.astral.sh/uv/getting-started/installation/) for environment management. Please follow the uv documentation for installation. Once uv is installed, create a Python 3.12 virtual environment using uv and install the project dependencies with the command below: ```bash uv sync uv pip install -e . ``` The recommended workflow is to run scripts directly with `uv` via `uv run