# PARA vs ACT OOD Object Position — Active Checklist

## Goal
Fair comparison of PARA vs ACT on OOD object position generalization.
Train on 8×8 grid (64 positions), test on held-out positions from 16×16 grid.

## Current Status
- [x] OOD objpos dataset generated (256 trajectories, natural-start servo)
- [x] Train/test splits created (64 train / 192 test)
- [x] PARA trained and evaluated — ~50% on both train and test
- [ ] **BUG: ACT teleport eval doesn't activate** — `pred_3d_targets` is None for ACT, teleport silently falls through to open-loop `env.step()`
- [ ] Fix ACT teleport: denormalize predicted positions → set as pred_3d_targets
- [ ] Re-evaluate ACT with working teleport
- [ ] If ACT still fails, investigate further (gripper timing, prediction accuracy at grasp point, etc.)
- [ ] Run fair comparison with both models using teleport
- [ ] Vary training set size to find ACT's working boundary

## Results So Far
| Experiment | PARA | ACT | Notes |
|-----------|------|-----|-------|
| Single position, 50 demos (N_WINDOW=6) | 95% | 85% | Both work |
| Single position, 50 demos (N_WINDOW=4) | 95% | 20% | ACT regressed — N_WINDOW issue |
| OOD 64 positions, train (pre-fix) | 20-53% | 0% | **ACT teleport bug** |
| OOD 64 positions, test (pre-fix) | 42-54% | 0% | Same bug |
| **OOD 64 positions, train (fixed)** | **44%** | **68%** | ACT wins with teleport fixed! |
| **OOD 64 positions, test (fixed)** | **55%** | **70%** | Both generalize, ACT slightly better |

## Key Finding
ACT outperforms PARA on OOD object positions (70% vs 55%) when eval is fair (both using teleport).
Both models generalize well (train ≈ test), meaning the 8×8 grid provides sufficient coverage.

## Bug Fixed
ACT teleport: `pred_3d_targets` was None for ACT, causing teleport to silently fall through to open-loop env.step(). Fixed by denormalizing ACT's sigmoid position predictions into absolute 3D targets.

## Next Steps
- [ ] Investigate why PARA is lower than ACT on this dataset (was higher on single-position)
- [ ] Run viewpoint generalization experiments
- [ ] Try varying train set size (fewer positions) to test data efficiency