# Internship Discussion Notes — April 14, 2026

## Topics to Think About

### Multiview Aggregation
- How to combine predictions from multiple cameras in the PARA framework?
- Options: aggregate heatmap volumes from each view (intersection/product), or predict in one view and validate with others
- Connection to PARA: each camera produces its own pixel-aligned heatmap volume → fuse in 3D space via known camera extrinsics
- Advantage over single-view: resolves depth ambiguity, handles occlusion
- Contrast with RVT: RVT renders virtual viewpoints then predicts heatmaps. PARA could instead predict in each real viewpoint and fuse — no rendering step needed
- Key question: does multiview PARA outperform single-view enough to justify the extra cameras?

### Mobile Manipulators
- PARA assumes a fixed camera with known intrinsics — how does this extend to a moving base?
- Camera pose changes continuously as the robot moves → PARA's height-based lifting naturally handles this (height is world-frame, not camera-frame)
- Navigation + manipulation: PARA for the manipulation part, separate policy for navigation?
- Or: PARA as a unified representation — predict pixel targets for both "where to drive" and "where to grasp"?
- Data efficiency story transfers: mobile manipulators have even less training data per environment

### How PARA Connects
- Multiview aggregation is a natural extension of PARA's pixel-aligned formulation — each view contributes independently
- Mobile manipulation tests PARA's viewpoint robustness claim in the extreme (continuous viewpoint change)
- Both are practical deployment scenarios where PARA's inductive biases should help