# Data pipeline

## Raiden visualizer (YAM workstation)

Canonical TRI-built browser viewer over the YAM data (rerun-based). Logs camera frustums, RGB + depth, point clouds, and the per-frame robot state.

### From the Mac, when on TRI Wi-Fi

```bash
# Terminal #1: open ssh tunnel + interactive shell on robot-lab
ssh -L 9090:localhost:9090 -L 9091:localhost:9091 robot-lab

# Inside that shell:
cd ~
source raiden/.venv/bin/activate
rd visualize --web --stride 5
#   fzf picker: pick task, then episode
#   prints: "Open in browser: http://localhost:9090?url=..."
#   copy that URL into the Mac browser

# Teardown: Ctrl+C the rd process, then `exit` the ssh session
```

### Off TRI Wi-Fi (chain through mac via Tailscale)

```bash
ssh -J mac -L 9090:localhost:9090 -L 9091:localhost:9091 robot-lab
```

### Scripted / non-interactive (bypass fzf picker)

The fzf-based picker requires a TTY. For automation, use the bypass wrapper
at `/home/robot-lab/cameron/rd_visualize_direct.py` (uploaded 2026-05-26):

```bash
ssh -L 9090:localhost:9090 -L 9091:localhost:9091 robot-lab
cd ~ && source raiden/.venv/bin/activate
python /home/robot-lab/cameron/rd_visualize_direct.py <task> [episode] \
    --web-port 9090 --stride 5 --image-scale 0.25
```

### CLI knobs (verified 2026-05-26)

- `--stride N` — log every Nth frame (default 1, full-res; 5–10 is snappier).
- `--image-scale F` — uniform downsample for images + point clouds (default 0.25 = 4× downsample).
- `--web` / `--no-web` — HTTP viewer vs native desktop. Always `--web` over SSH.
- `--web-port N` — HTTP port (default 9090). gRPC port = web_port + 1 (so default 9091 — tunnel both).

## Sample data location

`/home/robot-lab/data/processed` on the YAM workstation has sample Raiden output with cameras + depth, usable while live recording is being set up.

## Processed-data schema (verified 2026-05-26 on `pickup_apple`)

```
<task_name>/                       e.g. pickup_apple, PlaceGearsLeaderArms, Sort_objects_lf, …
├── <episode_id>/                  e.g. 0000, 0001, … (~4 episodes per task in pickup_apple)
│   ├── rgb/<frame>.jpg            720p RGB, per-frame
│   ├── depth/<frame>.npz          dense metric depth
│   ├── lowdim/<frame>.pkl         per-frame state + action + cam params (see below)
│   └── metadata.json              same shape as metadata_shared
├── calibration_results.json       ChArUco calibration (intrinsics + wrist hand-eye)
├── metadata_shared.json           task-level spec
└── split_all.json                 train/val/test split
```

### `metadata_shared.json` key fields

- `cameras`: typically `["left_wrist", "right_wrist", "scene_1"]` (bimanual, 3 cams)
- `resolution`: `[720, 1280]` (H, W)
- `framerate`: 30 fps
- `action.format`: `joint_cmd`, `action.dims`: 14 (7 left + 7 right)
- `extrinsics.transform`: `cam2world` — world = `left_arm_base` (set in calibration_results)
- `intrinsics.model`: `pinhole`
- `depth`: dense + metric
- `control`: `spacemouse` (teleop)
- `language.prompt` / `language.task`: free-text labels

### `lowdim/<frame>.pkl` schema (Python pickle, dict)

| Field | Shape | Meaning |
|---|---|---|
| `joints` | (14,) | Current joint angles, both arms (7 per arm) |
| `action_joints` | (14,) | Joint commands |
| `action` | (26,) | **EE-pose action vector** (see layout below) — despite `metadata.action.format = "joint_cmd"`, the `action` field is EE poses, not joint commands |
| `actual_poses` | (26,) | Realized EE poses, same layout as `action` |
| `extrinsics` | dict[cam_name → (4,4)] | Per-frame cam-to-world transforms (only the scene camera has real per-frame values; wrist cams are identity placeholders — use `FK(joints) @ hand_eye_calibration` from `calibration_results.json` for wrist cams) |
| `intrinsics` | dict[cam_name → (3,3)] | Intrinsics |
| `T_left_from_right` | (4, 4) | right_arm_base → left_arm_base (world). **Honestly named** — apply *forward* (no inversion) to bring right-arm-base coords into world. See "Action layout & frame conventions" below. |
| `language_prompt`, `language_task` | scalar strings | Task labels |

### Action layout & frame conventions (verified 2026-05-26)

`action[26]` and `actual_poses[26]` share this layout:

```
[ l_pos(3) | l_rot9(9) | l_grip(1) | r_pos(3) | r_rot9(9) | r_grip(1) ]
  0:3        3:12        12          13:16      16:25       25
```

- `l_pos` / `l_rot9` — **left EE pose in world** (= left_arm_base). Use directly.
- `r_pos` / `r_rot9` — **right EE pose in right_arm_base frame**. Convert to world:
  - `r_pos_world = T_left_from_right @ [r_pos; 1]`
  - `r_rot_world = T_left_from_right[:3,:3] @ r_rot9.reshape(3,3)`
  - **NO inversion.** The field name is honest.
- `l_grip` / `r_grip` — gripper opening (1.0 = open, ~0.03 = clamped on object).

Canonical reference: `~/raiden/raiden/visualizer.py` lines 194, 257, 394 — raiden's own visualizer applies `T_left_from_right` forward.

⚠️ Convention trap: `~/raiden/raiden/server.py:303,421-422` mentions a *separate* variable named `T_right_base_to_left_base` whose name lies (actually maps left→right, needs inverting). **That is not the lowdim field.** The lowdim `T_left_from_right` is the already-inverted, honest version.

### Camera naming differs across tasks (verified 2026-05-26)

| Task | Scene cam | Left wrist | Right wrist |
|---|---|---|---|
| `pickup_apple` | `scene_1` | `left_wrist` | `right_wrist` |
| `BlockOnBlockRightArmYAM` | `scene_camera` | `left_wrist_camera` | `right_wrist_camera` |

Always read `metadata_shared.json` → `cameras` for the actual list per task; do not hard-code.

### Visual verification of the convention (2026-05-26)

Projection script: `/home/robot-lab/cameron/yam_overlay/v8_multiframe.py` (multi-frame overlay) and `v9_crops.py` (300×300 ROI crops around each projected marker).

Verified on `BlockOnBlockRightArmYAM/0000/scene_camera`, frames 0 and 600 (frame 600: `r_grip=0.029`, right arm actively clamping the blue block):
- `LEFT_EE` (green) — on the left arm's gripper fingers ✓
- `RIGHT_EE` (red) — dead-center on the right arm's gripper grasping the block ✓
- `RIGHT_BASE` (magenta) = `T_left_from_right[:3,3]` — on the right arm's physical base mount ✓

⚠️ Earlier "RIGHT_EE wrong" appearance in `pickup_apple/0000/scene_1` was a **false negative** — pickup_apple is left-arm-dominant and the right arm is parked off-frame in scene_1, so the projected RIGHT_EE landed in empty workspace. The convention was already correct; it just had no visible reference. Use a bimanual / right-arm-active task to verify projections.

⚠️ The `T_left_from_right` translation is per-recording (re-calibrated each session). For `pickup_apple/0000` it was ~`[0.06, 0.05, -0.06]` (small — perhaps a single mount?), for `BlockOnBlockRightArmYAM/0000` it was `[-0.003, -0.597, 0.035]` (the expected ~60cm side-to-side arm separation). Do not assume a fixed inter-arm transform across recordings.

### `calibration_results.json` key fields

- `coordinate_frame`: `left_arm_base` (world frame for `cam2world` extrinsics)
- `charuco_config`: 9×9 squares, 0.03m square, 0.023m marker, DICT_6X6_250
- `cameras.<cam_name>.intrinsics`: full pinhole + distortion
- `cameras.<cam_name>.hand_eye_calibration`: rotation + translation (camera ↔ EEF), only set for wrist cams

## Task inventory at `/home/robot-lab/data/processed` (2.3 TB total, 2026-05-26)

```
BlockOnBlockRightArmYAM
CMU_DryRunAd
flip_soup_can_sm
pickup_apple                      ← closest analog to cup task; start here
PlaceBikeRotorToolOnRotor
PlaceGearsLeaderArms
PlaceGearsSpaceMouseControls
Sort_objects_lf
TRIAdversarial1
```

## TODO — fill in as you learn

- Confirm `extrinsics.left_wrist` is *not* identity for moving wrist cams (output got truncated when first inspected)
- Where new recordings get written (`raw` vs `processed`?)
- How to move data from YAM to DGX for training
- Whether there's a shared dataset blob across DGX nodes
- Backup / retention policy