# para/ — PARA model

Just one file: `model.py`. It defines `TrajectoryHeatmapPredictor`, the
DINOv3-backed pixel-aligned model that the rest of the repo trains and
deploys.

## What the model produces

Given an input `(B, 3, 448, 448)` RGB tensor and a *start keypoint* (the
EEF pixel at t=0), it returns:

| Output | Shape | What it is |
|---|---|---|
| `volume_logits` | `(B, N_WINDOW, N_HEIGHT_BINS, 64, 64)` | per-pixel × per-height-bin logits, softmaxed jointly. Argmax gives `(u, v, h_bin)` per timestep. |
| `gripper_logits` | `(B, N_WINDOW, N_GRIPPER_BINS)` | indexed at `query_pixels` during training (teacher forcing) or argmax pixel at inference. |
| `rotation_logits` | `(B, N_WINDOW, 3, N_ROT_BINS)` | per-axis euler logits, indexed the same way. |
| `feats` | `(B, D=384, 64, 64)` | the upsampled+refined patch feature map. Useful for visualization or reuse. |

## Constants you'll touch

| Name | Default | Notes |
|---|---|---|
| `N_WINDOW` | 6 | future timesteps predicted jointly |
| `N_HEIGHT_BINS` | 32 | over `[MIN_HEIGHT, MAX_HEIGHT]` |
| `N_GRIPPER_BINS` | 32 | over `[-1, 1]` |
| `N_ROT_BINS` | 32 | per euler axis |
| `PRED_SIZE` | 64 | supervision resolution; bilinearly upsampled to 448 for vis |
| `IMAGE_SIZE` | 448 | model input |
| `DINO_PATCH_SIZE` | 16 | ViT-S/16 |

`MIN_HEIGHT`/`MAX_HEIGHT`/`MIN_ROT`/`MAX_ROT`/`MIN_GRIPPER`/`MAX_GRIPPER`
are *placeholders*. The training script (`train_panda_para.py`) overrides
them at startup using stats computed from the dataset, then saves the
final values to `checkpoints/<run>/dataset_stats.json` so eval can
reload them.

## DINO weights

```python
DINO_REPO_DIR    = os.environ.get("DINO_REPO_DIR",    "<mac default>")
DINO_WEIGHTS_PATH = os.environ.get("DINO_WEIGHTS_PATH", "<mac default>")
```

On the lab box, set these env vars before importing — see
[`../docs/server_setup.md`](../docs/server_setup.md).

## How `start_keypoint` works

The model is told the EEF pixel at t=0 explicitly. It adds a learnable
`start_keypoint_embedding` to the patch token whose receptive field
covers that pixel. This anchors the heatmap to the right starting
location and is essentially free — it removes the ambiguity of "which arm
in the scene is yours" without needing a second branch.

## What this file is not

- It's **not** a `nn.Module` for joint commands. Action decoding (3D
  unprojection + IK) lives in `panda_streaming/deploy_ik_sequence.py` and
  `panda_streaming/test_ik_recovery.py`.
- It's **not** a tokenizer/normalizer for actions. The model learns
  discretized bins directly; conversion to continuous action lives at the
  deploy-time level.

## Sync with LIBERO

This file was copied verbatim from
`/data/cameron/para/libero/model.py`. If you change something here that
affects the architecture, decide whether LIBERO eval should track or
diverge, and document it inline.
