PARA Experiment Dashboard

Progress

4/11

tasks completed

Experiments

active tracks

Agents

backbones / vid_model / droid

Current Status & Todos

✓ PARA baseline training on LIBERO spatial (all tasks)
✓ ACT / DINO-VLA / InternVL baseline comparisons
✓ UVA video backbone training on LIBERO
✓ DROID dataset download pipeline
☐ OOD object position eval sweep
☐ OOD viewpoint eval sweep
☐ UVA + PARA wrapper end-to-end training
☐ DROID pretraining run
☐ LIBERO fine-tuning from DROID pretrained checkpoint
☐ Real robot deployment and eval
☐ Final comparison tables and paper figures

Experiment Tracks

OOD Generalization

Generalization

Testing how PARA's pixel-aligned formulation generalizes to out-of-distribution object positions and camera viewpoints. Comparing robustness and data efficiency against global-regression baselines (ACT, DINO-VLA, InternVL).

Does PARA generalize better to unseen object positions?
How does performance degrade under viewpoint shift?
Is PARA more data-efficient than global-regression baselines?

Video as Policy with PARA

Video Model

Comparing PARA's pixel-aligned regression head vs. global regression on top of a video generation backbone (UVA). Testing the hypothesis that PARA is more data-efficient for learning joint video-action policies.

Does PARA head outperform global regression on video backbone?
How does video conditioning improve action prediction?
What is the data efficiency gain from pixel-aligned prediction?

Large-Scale Pretraining

Pretraining

Pretraining PARA on the large-scale DROID dataset (100K+ trajectories) to test whether pixel-aligned prediction benefits from diverse cross-embodiment data.

Does DROID pretraining improve downstream LIBERO performance?
How does pretraining scale with dataset size?
Can PARA transfer across embodiments via pixel alignment?

Real Robot Experiments

Real Robot

Deploying PARA on a real Franka Panda arm to validate sim-to-real transfer and real-world pixel-aligned action prediction.

Does PARA's pixel-aligned prediction transfer to real hardware?
How sensitive is performance to camera calibration?
Can real-world PARA match or exceed sim baselines?