# TRI Internship — 12-Week Plan

**Window:** 2026-05-26 → 2026-08-17 · **12 weeks**
**Host:** Sergey Zakharov · **Lab:** Toyota Research Institute (Mountain View)
**Operational owner:** `yams` agent · **Strategy owner:** `project_highlevel` (this doc lives here)

---

## North star

Build a YC-pitch-grade demo of PARA on YAMs that demonstrates **data efficiency** as the headline thesis (per 2026-05-25 conversation with life_manager): *"Big robotics companies throw 3,000+ demos at the problem. We compete with them using ~80."*

End-state by **Aug 14, 2026**:
- 2-minute video showing lead task generalizing across held-out positions, viewpoints, environments
- Crisp sticky number on slide 1 (target: "80 demos vs 3,000+")
- Same model in multiple environments, including outdoor / sunlight

**Identity-level frame** (life_manager 2026-05-25): the goal isn't best-results-in-lab, it's *"the version of yourself at 27 that you'd hire — competent, consistent, on the path to economic viability."* Boring consistent routine is the thesis on your life.

---

## The 12-week arc (one-line per phase)

| Phase | Window | Goal | Owner |
|---|---|---|---|
| 1 — Reproduce on YAMs | **Wk 1** · May 26 – Jun 1 | PARA cup task working on YAM scene cam, OOD shows expected gap vs ACT | yams |
| 2 — Wrist cameras | **Wk 2** · Jun 2 – Jun 8 | Wrist cam integrated, extrinsics from FK + hand-eye correct, OOD eval | yams |
| 3 — Backbone experiments | **Wk 3** · Jun 9 – Jun 15 (parallel on lab GPUs) | DINO size / family ablation table; "head > backbone" claim strengthened | backbones |
| 4 — Lead dexterous task | **Wks 4–7** · Jun 16 – Jul 13 | Headline demo (desk organize? crumbled-shirt fold? cup remains?) data + train + iterate | yams + project_highlevel |
| 5 — Teaser tasks + video | **Wk 8** · Jul 14 – Jul 20 | "Same model also does X" teasers, single-take production footage | yams + figure_maker |
| 6 — Paper additions + polish | **Wks 9–11** · Jul 21 – Aug 10 | Camera-ready paper version with YAM results, video edited | paper_writer + figure_maker |
| 7 — Final presentation | **Wk 12** · Aug 11 – Aug 17 | TRI internal presentation, final video, handoff doc | all |

---

## Phase 1 — Reproduce on YAMs (Week 1, May 26 – Jun 1)

**Soft gate to exit:** PARA on YAM cup task with scene cam beats ACT by ≥30 points on held-out viewpoint.

| Day | Date | Focus |
|---|---|---|
| 1 | May 26 (Tue) | TRI onboarding, SSH access, data exploration. ✓ Done. |
| 2 | May 27 (Wed) | Record first data via Raiden's pipeline; print fiducial exoskeleton; quantify Raiden's calibration residual |
| 3 | May 28 (Thu) | Collect 80 demos (3 envs × 4 viewpoints) cup task on YAM |
| 4 | May 29 (Fri) | Train PARA + ACT on the 80 demos (DGX or personal box) |
| 5 | May 30 (Sat) | Evaluate at OOD held-out positions / viewpoints |
| 6 | May 31 (Sun) | Iteration / hardware recovery slack |
| 7 | Jun 1 (Mon) | Phase 1 retro: did we hit the gate? Decide on wrist cam start |

**Parallel through the week:** print + assemble fiducial exoskeleton, measure custom calibration residual vs Raiden's.

**Phase 1 decisions to lock by end of Week 1:**
1. **Calibration source:** use Raiden's, swap to custom fiducial, or hybrid?
2. **Train surface default:** personal box vs DGX for the 80-demo runs?
3. **Lead task confirmation:** cup task is fine for reproduction; the *headline dexterous task* (Phase 4) is a separate decision below.

---

## Phase 2 — Wrist Cameras (Week 2, Jun 2 – 8)

**Soft gate to enter:** Phase 1 gate met (PARA reproducing on YAM scene cam).
**Soft gate to exit:** Adding wrist cam doesn't *break* OOD performance; ideally improves it.

Key technical risk: wrist cam extrinsics aren't stored per-frame in Raiden's lowdim (identity placeholders) — must be computed as `FK(joints) @ hand_eye`. yams already knows about this.

Things to do this week:
- Add wrist cam to PARA's data loader + training inputs
- Verify wrist cam pose stream is correct in deploy too
- Retrain PARA with wrist cam included
- OOD eval: does wrist cam help, hurt, or wash?

**Phase 2 decisions to lock:**
1. **Include wrist cam in Phase 4 demo?** Only if it improves OOD or simplifies grasping.

---

## Phase 3 — Backbone Experiments (Week 3, Jun 9 – 15)

**Runs entirely on lab GPUs via `backbones` agent.** Zero attention cost at TRI.

Ablation table for the paper:
| Backbone | Params | Train pix err | Val OOD |
|---|---|---|---|
| DINOv2 ViT-S/16 | 22M | ? | ? |
| DINOv3 ViT-S/16+ | 30M | 9.32 (existing) | ? |
| DINOv3 ViT-B/16 | 86M | ? | ? |
| DINOv3 ViT-L/16 | 305M | 9.88 (existing) | ? |
| SAM / other | ? | ? | ? |

Strengthens "head > backbone" claim. Drops directly into paper Section 5 as an ablation. No TRI-side blockers.

---

## Phase 4 — Lead Dexterous Task (Weeks 4–7, Jun 16 – Jul 13)

**The biggest open decision in the whole plan.** Three candidates:

1. **Cup pick-and-place (continued from Phase 1)** — safest, already working, can scale to multi-position / multi-environment for a clean OOD story. Risk: not novel enough for YC pitch.
2. **Desk organize (life_manager rec, 2026-05-25)** — novel, universally relatable, hits data-efficient story, rich sub-tasks (pen rotate, folder place, mug aside, paper stack). Risk: long-horizon multi-step, harder to nail.
3. **Crumbled-shirt fold (Cameron interest, 2026-05-11)** — closer to towel-fold than messy-pile-laundry. Risk: cloth dynamics are notoriously hard.

**Recommendation (project_highlevel, 2026-05-25):** lead with **desk organize**. Teaser the others.

**Decision deadline:** End of Phase 2 (Jun 8). Don't enter Phase 4 with the lead task undecided.

Phase 4 weekly cadence:
- Wk 4 (Jun 16-22): collect ~30 demos lead task, train v0, iterate
- Wk 5 (Jun 23-29): identify failure modes, add 10-20 demos targeting failures
- Wk 6 (Jun 30 – Jul 6): generalization tests (new desks / new positions)
- Wk 7 (Jul 7-13): teaser tasks on the same model (cup, towel, wipe)

---

## Phase 5 — Teaser Tasks + Video Production (Week 8, Jul 14 – 20)

Money-shot video structure (locked 2026-05-12):
```
0:00–0:15  Hook: "Today's policies need 1000s of demos. Ours needs 80."
0:15–1:00  Lead-task demo, real-time, full sequence
1:00–1:30  Lead task, new env/position, no retrain → still works
1:30–1:50  Teaser tasks (same model, different tasks)
1:50–2:00  Outro: "Trained in <8 weeks at TRI."
```

Production must-haves:
- Tripod, manual exposure lock
- Live heatmap overlay during deploy (depends on yams shipping that infra by Phase 4)
- Side-by-side PARA vs baseline at each OOD axis
- Single continuous take for at least one "camera moves during execution" shot

---

## Phase 6 — Paper Additions + Polish (Weeks 9–11, Jul 21 – Aug 10)

Owned by `paper_writer` + `figure_maker`. yams provides:
- Final YAM training-run numbers
- Final OOD eval numbers
- Source assets for fig1 right-panel update (real cup-task rollout stills with heatmap overlay, replacing the SO-100 placeholders)
- Per-frame video clips for any figure that needs them

Camera-ready additions to the paper:
- "Replicated on TRI's YAM bimanual arms" section
- Updated fig1 with real cup-task evidence
- Wrist cam ablation if it helped
- "Head > backbone" backbone-ablation table

---

## Phase 7 — Final Presentation (Week 12, Aug 11 – 17)

- Internal TRI presentation
- 2-min video final cut
- Wrap-up doc + handoff to TRI side
- Personal retro: Aug 14 video sent to one friend / advisor / potential investor — the "would I hire myself?" test

---

## Open decisions floating (need lock before Phase 4)

| # | Decision | Default lean | Lock deadline |
|---|---|---|---|
| 1 | Lead dexterous task | desk organize | end of Phase 2 (Jun 8) |
| 2 | Sticky number for the pitch | "80 demos vs 3,000+" | Phase 1 retro (Jun 1) |
| 3 | Calibration source (Raiden vs custom fiducial) | Raiden for week 1, evaluate week 2 | end of Phase 1 (Jun 1) |
| 4 | Wrist cam in headline demo | only if it helps OOD | end of Phase 2 (Jun 8) |
| 5 | Diffusion Policy as second baseline | nice-to-have, not blocking | TBD |

---

## Routine + life side (per life_manager, 2026-05-25)

- Bedtime ramp: 10:30pm by end of Wk 1, target 10pm + 6am wake by end of Wk 2
- Lift: 25–30 min heavy/intense, 3-4x/week
- Walk: 1 hr midday + reach 15k steps daily
- Diet: ~1500 cal floor (the "15 rule")
- Meditation: 20 min daily, especially first 2 weeks (high context-load)
- One LA trip (Jun 20) baked into Wk 4 — take it as a real break, not a working trip

---

## How to use this doc

- **Monday mornings**: re-read this week's phase + any decisions you owe
- **Weekly retro (Fridays or Sundays)**: did this week's gate hit? If yes, advance. If no, why not, what's the slip?
- **Mid-phase**: update the decisions table as you lock things
- **End of internship**: this doc becomes the retro source-of-truth — "we planned X, we delivered Y, here's why"

Owner: project_highlevel. Edit freely (Cameron + project_highlevel + yams); other agents read-only.