Pixel Aligned Robot Actions (PARA)

Are More Data-Efficient and Robust to New Viewpoints and Environments

Project Overview

PARA project overview

Method Overview

PARA method overview
TL;DR

We reformulate robot actions as a per-pixel regression problem, enabling dramatically improved data efficiency and generalization compared to global coordinate regression, thanks to dense supervision and shift equivarianc

Data Efficiency

Same Viewpoint and Environment; Ours vs ACT.

Pick and Place

PARA (Ours)
ACT

97% Task Completion Rate

9% Task Completion Rate

PARA + Droid Pretrain
ACT + Droid Pretrain

—% Task Completion Rate

Video coming soon

—% Task Completion Rate

Video coming soon

Fold Towel

PARA (Ours)
ACT

97% Task Completion Rate

11% Task Completion Rate

PARA + Droid Pretrain
ACT + Droid Pretrain

—% Task Completion Rate

Video coming soon

—% Task Completion Rate

Video coming soon

Wipe Table

PARA (Ours)
ACT

95% Task Completion Rate

0% Task Completion Rate

PARA + Droid Pretrain
ACT + Droid Pretrain

—% Task Completion Rate

Video coming soon

—% Task Completion Rate

Video coming soon

Robustness to Viewpoint and Environment

New Viewpoint
New Viewpoint
New Viewpoint + 5ep F.T.
New Viewpoint + 5ep F.T.
New Environment
New Environment
PARA (Ours)
ACT
PARA (Ours)
ACT
PARA (Ours)
ACT

52% Task Completion Rate

0% Task Completion Rate

87% Task Completion Rate

4% Task Completion Rate

94% Task Completion Rate

0% Task Completion Rate

PARA + Droid Pretrain
ACT + Droid Pretrain
PARA + Droid Pretrain
ACT + Droid Pretrain
PARA + Droid Pretrain
ACT + Droid Pretrain

—% Task Completion Rate

Video coming soon

—% Task Completion Rate

Video coming soon

—% Task Completion Rate

Video coming soon

—% Task Completion Rate

Video coming soon

—% Task Completion Rate

Video coming soon

—% Task Completion Rate

Video coming soon

Deployed on Other Embodiments

SO300

SO300 experiment

Piper

SO300 experiment

Panda

SO300 experiment

G1

SO300 experiment

Scaling up to Calibrated Droid

Put Toy into Box

PARA (Ours) — % Completion Rate

Video coming soon

ACT — % Completion Rate

Video coming soon

Press Button

PARA (Ours) — % Completion Rate

Video coming soon

ACT — % Completion Rate

Video coming soon

Pick Up Sponge and Place in Bowl

PARA (Ours) — % Completion Rate

Video coming soon

ACT — % Completion Rate

Video coming soon

More Data-Efficient Hand Co-Training

Hand Demonstrations Curtained with PARA (Ours) vs ACT.

Method
Hand Demonstration
10 Demos Without Cotrain
50 Demos Without Cotrain
10 Demos WITH Cotrain
50 Demos WITH Cotrain
PARA (Ours)

PARA (Ours) experiment

50% Task Completion Rate

SO300 experiment

80% Task Completion Rate

SO300 experiment

80% Task Completion Rate

SO300 experiment

100% Task Completion Rate

SO300 experiment
ACT

PARA (Ours) experiment

5% Task Completion Rate

SO300 experiment

10% Task Completion Rate

SO300 experiment

15% Task Completion Rate

SO300 experiment

20% Task Completion Rate

SO300 experiment

Backbones

DINO

95% Task Completion Rate

DINO backbone

SmolVLA

87% Task Completion Rate

SmolVLA backbone

MoGe

92% Task Completion Rate

MoGe backbone

COSMOS Video Model

94% Task Completion Rate

Video Model backbone