(a) Inference pipeline visualization
Camera Frustum
observer view
Heatmap Volume
per-pixel height
Argmax → 3D
lift with geometry
Robot at Target
execute action
(b) Height vs Depth