# PARA Project TODOs

## In Progress

- [ ] **Panda deployment (HIGHEST PRIORITY)** — second embodiment. Master's student (friend's contact, experienced with calibrated Panda setup) supposedly running this. **FOLLOW UP (as of 2026-04-29):** confirm they're actually progressing — Cameron flagged uncertainty that they've started. Ping them or check via friend.
- [ ] **"In the wild" campus deployment (HIGHEST PRIORITY)** — train robot in ~2 locations, then wheel it around campus (quad, hallways, outdoor) on the mobile desk and test. Real-world spatial/environment generalization. Train left/test right but in the REAL WORLD.
   - Custom arm: **BUILT + ArUco-based joint calibration done as of 2026-04-29.** Self-calibration (commanding/conditioning) verified.
   - Remaining custom-arm finalization: iPhone streaming with consistent intrinsics, joint limits, misc tedious config.
- [ ] **UMI gripper build + data collection (START NOW)** — build the UMI mini gripper, start collecting demos. Test the video model (SVD + PARA) fine-tuning on real-world data. Key questions: how does video model fine-tuning transfer to real world? How does it generalize to new viewpoints on real hardware? Also serves as a third embodiment for cross-embodiment story. Can do this while waiting for Panda and new arm.
   - **NEXT PHYSICAL STEP (2026-04-29): design + print ArUco-tagged handle/box for UMI gripper.** ~15–20 min CAD, overnight print. ArUco box at the handle lets external camera recover gripper pose. Why UMI over hand teleop: removes scale ambiguity and finger-closing ambiguity that plague bare-hand demos. Robot gripper handles already added for kinesthetic mode — UMI is the symmetric demonstrator-side tool.
- [ ] **Cross-embodiment TASK transfer (HIGH PRIORITY)** — record ~3 tasks with human hands (50 demos each, same camera as robot), ~3 different tasks with robot (20 demos each). Train PARA on all data (hand tasks = heatmap-only supervision via point tracks, robot tasks = full supervision). Test: robot executes the hand-only tasks it's never done itself. If it works, this is the headline: "human demos teach a robot new tasks through pixel-aligned supervision." Needs: same camera setup for hand+robot, point track extraction (CoTracker), can start with UMI or small 6DOF arm.
- [ ] **Paper rewrite** — draft being rewritten by paper_writing agent. KeyGrip → PARA rename, real robot results, LIBERO OOD, video backbone. Check paper at https://omidlab.net/paper
- [ ] **Arm-deletion point track pretraining experiment** — backbones agent is working on this. Delete robot arm visually, render EEF dot, pretrain PARA (1 height bucket heatmap) and global regression (CLS→MLP→uv) on 256 arm-deleted demos, fine-tune with 10 full demos. Tests whether 2D pixel track pretraining transfers to full robot control. Derisk for hand demo / cross-embodiment pretraining.

## Experiments To Run

- [ ] **Translation-based viewpoint shifts** — current viewpoint experiments only test rotation (spherical cap sampling). Add horizontal and vertical camera translation tests (slide camera left/right, up/down without re-orienting). This directly tests PARA's translational equivariance — should be the strongest advantage. Keep camera orientation fixed, only change position.
- [ ] **Diffusion Policy baseline in LIBERO** — popular baseline, reviewers will ask. Should fail OOD for same coordinate-regression reasons as ACT. Validates that the failure is about parameterization, not about ACT specifically.
- [ ] **Longer-horizon LIBERO tasks (LIBERO-Long)** — multi-step tasks (open drawer, pick object, place inside, close). Tests whether PARA's advantage compounds over more replan cycles. Lower priority since real robot already covers 3 tasks.
- [ ] **Video backbone on real robot** — currently SVD+PARA is LIBERO-only (90% vs 0%). Even one real-robot task with video backbone would strengthen Contribution 2.
- [ ] **Hand demo co-training (after arm-deletion results)** — if arm-deletion experiment works, collect ~50-100 hand demos from robot camera viewpoint on SO-100, co-train with robot demos. Compare data efficiency: 10 robot + 50 hand demos vs 20 robot demos.

## Visualizations / Media

- [ ] **Combined project video** (~60-90s) — stitch DINO PCA consistency, ACT vs PARA comparison, feature PCA grid, real robot clips. See video_storyboard.md.
- [ ] **PARA heatmap overlay on rollout** — show heatmap prediction following the object during a PARA rollout. Standalone visualization.
- [ ] **Per-theta viewpoint chart as PNG** — exists on website as Chart.js but need standalone image for paper/slides.
- [ ] **Keynote slides** — generate .pptx from keynote_notes.md using slides.py helper.

## Paper Writing

- [ ] **Method diagram / Figure 1** — Figma figures exist but need final layout for paper.
- [ ] **Results figures** — publication-quality versions of distribution plots, per-theta chart, real robot qualitative grid.
- [ ] **Related work** — address RVT (multi-view heatmaps, different input modality), Lift-Splat-Shoot (2D→3D for perception not action), affordance methods.
- [ ] **Cross-embodiment discussion paragraph** — pixel-aligned supervision is embodiment-agnostic (supervises interaction point, not robot). Decouples "what to interact with" from "how to move." Tie to arm-deletion experiment results.

## Long-Term / Future Work

- [ ] **DROID pretraining** — download ~55% complete. Train PARA on 95K diverse robot episodes. Cross-lab, cross-camera pretraining story.
- [ ] **Egocentric video pretraining** — point tracks from Ego4D/Epic-Kitchens. Viewpoint alignment problem needs solving first. Park until DROID pretraining works.
- [ ] **Multi-embodiment real robot** — new arm + SO-100 + Panda, show same PARA model transfers or benefits from cross-embodiment pretraining.
