# Report Format Guide

Every experiment report should follow this structure. Use the
`shared/report.py` helper (currently at
`/data/cameron/para/.agents/shared/report.py`) to generate HTML.

---

## 1. Summary

A 2-3 sentence overview of what was run, the key finding, and whether results
matched expectations.

```python
r = Report("Viewpoint Shift Eval — PARA vs ACT", agent="backbones")
r.text("Evaluated PARA and ACT baselines on 15° and 30° camera viewpoint shifts across "
       "all 10 LIBERO spatial tasks. PARA retains 78% success rate at 30° shift vs ACT's 41%.")
```

---

## 2. Training Data

Show what the model was trained on. Include:
- Dataset name, size, and split
- 2-4 sample frames or a short video clip showing representative training episodes
- Any data augmentation or preprocessing applied

```python
r.heading("Training Data")
r.text("Trained on LIBERO spatial, 50 demos per task (500 total). "
       "448x448 RGB, frame stride 3, 12-step prediction window.")
r.image(r.add_media_file("/path/to/train_sample_grid.png"),
        caption="Sample training frames across tasks 0-9")
r.video(r.add_media_file("/path/to/train_demo_example.mp4"),
        caption="Example training episode (task 3)")
```

---

## 3. Test Setup

Show the evaluation conditions. Include:
- What changed vs training (OOD shift, new viewpoint, new objects, etc.)
- 2-4 sample frames or video from the test distribution
- Number of eval episodes, seeds, any randomization

```python
r.heading("Test Setup")
r.text("Evaluated with camera shifted 30° from training viewpoint. "
       "20 episodes per task, 10 tasks, deterministic initial states.")
r.image(r.add_media_file("/path/to/test_viewpoint_comparison.png"),
        caption="Left: training viewpoint. Right: 30° shifted test viewpoint.")
```

---

## 4. Results

The core of the report. Include:
- **Summary table** with success rates / metrics per model per condition
- **Eval videos** — at least 1 success and 1 failure per model/condition of interest
- **Plots** if relevant (learning curves, per-task breakdown, etc.)
- Brief interpretation of what the numbers mean

```python
r.heading("Results")
r.table(
    headers=["Model", "Train SR%", "OOD 15° SR%", "OOD 30° SR%"],
    rows=[
        ["PARA (ours)", "92%", "85%", "78%"],
        ["ACT", "89%", "64%", "41%"],
        ["DINO-VLA", "87%", "71%", "52%"],
    ],
    caption="Success rates across viewpoint shifts (avg over 10 tasks)"
)

r.heading("Per-Task Breakdown", level=3)
r.image(r.add_media_file("/path/to/per_task_bar_chart.png"),
        caption="Per-task success rates at 30° viewpoint shift")

r.heading("Example Rollouts", level=3)
r.text("Task 7 — PARA succeeds where ACT fails under 30° shift:")
r.video(r.add_media_file("/path/to/para_task7_success.mp4"),
        caption="PARA — task 7, 30° shift (success)")
r.video(r.add_media_file("/path/to/act_task7_fail.mp4"),
        caption="ACT — task 7, 30° shift (failure)")
```

---

## 5. Analysis

Dig into why things worked or didn't:
- Failure modes — what goes wrong in the failure videos?
- Which tasks are hardest and why?
- Any surprising results?

---

## 6. Next Steps & Concerns

Be specific and actionable:
- What to run next based on these results
- Any concerns about methodology, data, or validity
- Blockers or dependencies

---

## 7. Reproducibility

Always include the exact command used to generate the results.

```python
r.heading("Reproducibility")
r.code("CUDA_VISIBLE_DEVICES=5 python libero/eval.py \\\n"
       "  --checkpoint checkpoints/para_spatial_all/best.pth \\\n"
       "  --benchmark libero_spatial --task_ids all \\\n"
       "  --viewpoint_shift 30 --episodes 20 \\\n"
       "  --save_videos --output_dir out/para_30deg_eval", language="bash")
```

---

## Tips

- **Keep it scannable.** Someone should get the main takeaway from the summary + results table without reading anything else.
- **Always show failure cases.** Successes are boring — failures tell you what to fix.
- **Compare to baseline.** Never show results for just one model. Always include at least one comparison.
- **Use `add_media_file()`** to copy videos/images into the report's media directory. Don't use absolute paths in the report HTML.
- **Name your report clearly.** The filename becomes the listing title: `"Viewpoint Shift Eval — 30°"` not `"Results 3"`.

## Video Encoding

Videos MUST be H.264 encoded to play in the browser. Re-encode if needed:

```bash
ffmpeg -i input.mp4 -c:v libx264 -preset ultrafast -crf 23 -movflags +faststart output_h264.mp4
```

Or when saving eval videos in Python, specify H.264:

```python
imageio.mimwrite("out.mp4", frames, codec="libx264", quality=8)

fourcc = cv2.VideoWriter_fourcc(*'avc1')
writer = cv2.VideoWriter("out.mp4", fourcc, fps, (w, h))
```

The dashboard server auto-transcodes mpeg4, but encoding correctly from the
start is faster and more reliable.