Loading dataset... CachedTrajectoryDataset: 64 demos, 2048 samples Source: {'cache_root': '/data/libero/ood_objpos_v3_splits/exp4_n64_train', 'benchmark': 'libero_spatial', 'task_ids': [0], 'frame_stride': 3} Total: 2048 samples ✓ Train: 1946 samples ✓ Val: 102 samples Initializing model (type=act)... Loading DINOv2 model... ✓ DINOv2 backbone is trainable ✓ CLIP projection: 512 → 384 ✓ ACT model: 30,394,048 / 30,394,048 trainable params Input: CLS(384) + start_kp(2) + eef_pos(3) + gripper(1) = 774 pos_mlp: → (B, 4, 3) [sigmoid, normalized] rot_mlp: → (B, 4, 3) [sigmoid, normalized] gripper_mlp: → (B, 4) [sigmoid, normalized] Trainable parameters: 30,394,048 / 30,394,048 (100.00%) Computing dataset stats from random subset: 500/2048 samples (seed=42) /data2/cameron/miniconda3/envs/uva/lib/python3.10/site-packages/wandb/sdk/data_types/image.py:324: DeprecationWarning: 'mode' parameter is deprecated and will be removed in Pillow 13 (2026-10-15) ✓ Saved stats cache: /data/cameron/567_augmentation_viewpoint_project/checkpoints/act_defvp_crop_nokp/dataset_stats.json ✓ Height range from dataset: [0.917341, 1.177162] m ✓ Gripper range from dataset: [-1.000000, 1.000000] ✓ Rotation range (delta rotvec): ['-0.107', '-0.076', '-0.020'] .. ['0.040', '0.249', '0.020'] ✓ Reference rotation: ['0.9994', '-0.0011', '-0.0338', '-0.0038'] ✓ Position range from dataset: ['-0.408', '-0.366', '0.917'] .. ['0.132', '0.273', '1.177'] Starting training for 1000 epochs... ✓ Rotation loss SKIPPED ✓ EMA loss weights initialized (vol=11.8, rot=3.5, grip=0.69) self._image = pil_image.fromarray( Epochs: 0%| | 2/1000 [01:41<14:07:45, 50.97s/it] ============================================================ Epoch 0/1000 ============================================================ [async eval] logged video to wandb: ep000_fail.h264.h264.mp4 [async eval] logged video to wandb: ep000_fail.h264.h264.h264.h264.mp4 Train Loss: 0.2806 (Volume: 0.0294, Gripper: 0.2512, Rotation: 0.0000) Val - Loss: 0.2722, Volume: 0.0057, Pixel Error: 13.39px, Height Error: 0.000mm, Gripper: 0.1716 ✓ Saved best model (val_loss=0.2722) ============================================================ Epoch 1/1000 ============================================================ [async eval] logged video to wandb: ep000_fail.h264.h264.h264.h264.h264.h264.mp4 [async eval] logged video to wandb: ep000_fail.h264.h264.h264.h264.h264.h264.h264.h264.mp4 Train Loss: 0.0144 (Volume: 0.0070, Gripper: 0.0074, Rotation: 0.0000) Val - Loss: 0.2433, Volume: 0.0039, Pixel Error: 10.10px, Height Error: 0.000mm, Gripper: 0.1471 ✓ Saved best model (val_loss=0.2433) ============================================================ Epoch 2/1000 ============================================================ [async eval] launched at step 500 [async eval] logged video to wandb: ep000_fail.h264.h264.h264.h264.h264.h264.h264.h264.h264.h264.mp4 [async eval] logged video to wandb: ep000_fail.h264.h264.h264.h264.h264.h264.h264.h264.h264.h264.h264.h264.mp4 [async eval] logged video to wandb: ep000_fail.mp4 Train Loss: 0.0090 (Volume: 0.0045, Gripper: 0.0045, Rotation: 0.0000) Val - Loss: 0.2388, Volume: 0.0024, Pixel Error: 8.26px, Height Error: 0.000mm, Gripper: 0.1324 ✓ Saved best model (val_loss=0.2388) ============================================================ Epoch 3/1000 ============================================================ [async eval] logged video to wandb: ep000_fail.h264.mp4 [async eval] logged video to wandb: ep000_fail.h264.h264.h264.mp4 Train Loss: 0.0083 (Volume: 0.0041, Gripper: 0.0042, Rotation: 0.0000) Val - Loss: 0.1853, Volume: 0.0025, Pixel Error: 8.84px, Height Error: 0.000mm, Gripper: 0.0784 ✓ Saved best model (val_loss=0.1853) ============================================================ Epoch 4/1000 ============================================================ [async eval] launched at step 1000 [async eval] logged video to wandb: ep000_fail.h264.h264.h264.h264.h264.mp4 Saved step checkpoint: step_1000.pth [async eval] logged video to wandb: ep000_fail.h264.mp4 [async eval] logged video to wandb: ep000_fail.h264.h264.mp4 Train Loss: 0.0066 (Volume: 0.0033, Gripper: 0.0032, Rotation: 0.0000) Val - Loss: 0.2024, Volume: 0.0017, Pixel Error: 6.31px, Height Error: 0.000mm, Gripper: 0.1176 ============================================================ Epoch 5/1000 ============================================================ [async eval] logged video to wandb: ep000_fail.h264.h264.mp4 [async eval] logged video to wandb: ep000_fail.h264.h264.h264.h264.mp4 Train Loss: 0.0062 (Volume: 0.0031, Gripper: 0.0031, Rotation: 0.0000) Val - Loss: 0.1718, Volume: 0.0015, Pixel Error: 6.55px, Height Error: 0.000mm, Gripper: 0.0588 ✓ Saved best model (val_loss=0.1718) ============================================================ Epoch 6/1000 ============================================================ [async eval] launched at step 1500 [async eval] logged video to wandb: ep000_success.mp4 [async eval] logged video to wandb: ep000_success.h264.h264.mp4 [async eval] logged video to wandb: ep000_success.h264.h264.h264.h264.mp4 Train Loss: 0.0052 (Volume: 0.0026, Gripper: 0.0026, Rotation: 0.0000) Val - Loss: 0.1544, Volume: 0.0012, Pixel Error: 5.45px, Height Error: 0.000mm, Gripper: 0.0392 ✓ Saved best model (val_loss=0.1544) ============================================================ Epoch 7/1000 ============================================================ [async eval] logged video to wandb: ep000_fail.h264.mp4 [async eval] logged video to wandb: ep000_fail.h264.h264.h264.mp4 Train Loss: 0.0052 (Volume: 0.0027, Gripper: 0.0026, Rotation: 0.0000) Val - Loss: 0.1461, Volume: 0.0013, Pixel Error: 5.73px, Height Error: 0.000mm, Gripper: 0.0392 ✓ Saved best model (val_loss=0.1461) ============================================================ Epoch 8/1000 ============================================================ [async eval] launched at step 2000 [async eval] logged video to wandb: ep000_success.h264.mp4 Saved step checkpoint: step_2000.pth [async eval] logged video to wandb: ep000_success.h264.h264.h264.mp4 Train Loss: 0.0045 (Volume: 0.0023, Gripper: 0.0022, Rotation: 0.0000) Val - Loss: 0.1882, Volume: 0.0012, Pixel Error: 5.32px, Height Error: 0.000mm, Gripper: 0.0686 ============================================================ Epoch 9/1000 ============================================================ [async eval] logged video to wandb: ep000_fail.h264.mp4 [async eval] logged video to wandb: ep000_fail.h264.h264.h264.mp4 [async eval] logged video to wandb: ep000_success.mp4 Train Loss: 0.0048 (Volume: 0.0024, Gripper: 0.0024, Rotation: 0.0000) Val - Loss: 0.1418, Volume: 0.0012, Pixel Error: 5.74px, Height Error: 0.000mm, Gripper: 0.0441 ✓ Saved best model (val_loss=0.1418) ============================================================ Epoch 10/1000 ============================================================ [async eval] launched at step 2500 [async eval] logged video to wandb: ep000_success.h264.h264.h264.h264.mp4 [async eval] logged video to wandb: ep000_success.h264.h264.h264.h264.mp4 Train Loss: 0.0061 (Volume: 0.0031, Gripper: 0.0030, Rotation: 0.0000) Val - Loss: 0.1573, Volume: 0.0015, Pixel Error: 6.21px, Height Error: 0.000mm, Gripper: 0.0686 ============================================================ Epoch 11/1000 ============================================================ [async eval] logged video to wandb: ep000_fail.mp4 [async eval] logged video to wandb: ep000_success.mp4 [async eval] logged video to wandb: ep000_success.h264.h264.mp4 Train Loss: 0.0047 (Volume: 0.0024, Gripper: 0.0023, Rotation: 0.0000) Val - Loss: 0.1288, Volume: 0.0009, Pixel Error: 4.55px, Height Error: 0.000mm, Gripper: 0.0245 ✓ Saved best model (val_loss=0.1288) ============================================================ Epoch 12/1000 ============================================================ [async eval] launched at step 3000 [async eval] logged video to wandb: ep000_success.h264.h264.h264.h264.h264.mp4 Saved step checkpoint: step_3000.pth [async eval] logged video to wandb: ep000_success.h264.h264.h264.h264.h264.h264.h264.mp4 Train Loss: 0.0038 (Volume: 0.0020, Gripper: 0.0018, Rotation: 0.0000) Val - Loss: 0.1442, Volume: 0.0016, Pixel Error: 6.62px, Height Error: 0.000mm, Gripper: 0.0490 ============================================================ Epoch 13/1000 ============================================================ [async eval] logged video to wandb: ep000_fail.mp4 [async eval] logged video to wandb: ep000_fail.h264.h264.mp4 [async eval] logged video to wandb: ep000_fail.h264.h264.h264.h264.mp4 Train Loss: 0.0043 (Volume: 0.0022, Gripper: 0.0021, Rotation: 0.0000) Val - Loss: 0.1445, Volume: 0.0014, Pixel Error: 6.49px, Height Error: 0.000mm, Gripper: 0.0490 ============================================================ Epoch 14/1000 ============================================================ [async eval] launched at step 3500 [async eval] logged video to wandb: ep000_fail.h264.h264.h264.h264.h264.h264.h264.mp4 [async eval] logged video to wandb: ep000_fail.h264.h264.h264.h264.h264.h264.h264.h264.mp4 Train Loss: 0.0040 (Volume: 0.0020, Gripper: 0.0020, Rotation: 0.0000) Val - Loss: 0.1514, Volume: 0.0010, Pixel Error: 4.73px, Height Error: 0.000mm, Gripper: 0.0588 ============================================================ Epoch 15/1000 ============================================================ [async eval] logged video to wandb: ep000_fail.h264.mp4 [async eval] logged video to wandb: ep000_fail.h264.h264.h264.mp4 [async eval] logged video to wandb: ep000_fail.h264.h264.h264.h264.mp4 Train Loss: 0.0044 (Volume: 0.0022, Gripper: 0.0022, Rotation: 0.0000) Val - Loss: 0.1325, Volume: 0.0010, Pixel Error: 5.25px, Height Error: 0.000mm, Gripper: 0.0441 ============================================================ Epoch 16/1000 ============================================================ [async eval] launched at step 4000 [async eval] logged video to wandb: ep000_fail.h264.h264.h264.h264.h264.h264.h264.mp4 Saved step checkpoint: step_4000.pth [async eval] logged video to wandb: ep000_fail.h264.h264.h264.h264.h264.h264.h264.h264.h264.mp4 Train Loss: 0.0092 (Volume: 0.0045, Gripper: 0.0046, Rotation: 0.0000) Val - Loss: 0.1439, Volume: 0.0015, Pixel Error: 5.85px, Height Error: 0.000mm, Gripper: 0.0490 ============================================================ Epoch 17/1000 ============================================================ [async eval] logged video to wandb: ep000_fail.h264.mp4 [async eval] logged video to wandb: ep000_fail.h264.h264.mp4 Train Loss: 0.0039 (Volume: 0.0019, Gripper: 0.0019, Rotation: 0.0000) Val - Loss: 0.1204, Volume: 0.0011, Pixel Error: 4.88px, Height Error: 0.000mm, Gripper: 0.0098 ✓ Saved best model (val_loss=0.1204) ============================================================ Epoch 18/1000 ============================================================ [async eval] logged video to wandb: ep000_fail.h264.h264.h264.h264.h264.mp4 [async eval] launched at step 4500 [async eval] logged video to wandb: ep000_fail.mp4 [async eval] logged video to wandb: ep000_fail.h264.h264.mp4 Train Loss: 0.0029 (Volume: 0.0015, Gripper: 0.0014, Rotation: 0.0000) Val - Loss: 0.1169, Volume: 0.0010, Pixel Error: 4.79px, Height Error: 0.000mm, Gripper: 0.0196 ✓ Saved best model (val_loss=0.1169) ============================================================ Epoch 19/1000 ============================================================ [async eval] logged video to wandb: ep000_fail.mp4 [async eval] logged video to wandb: ep000_fail.h264.h264.mp4 Train Loss: 0.0036 (Volume: 0.0018, Gripper: 0.0018, Rotation: 0.0000) Val - Loss: 0.1119, Volume: 0.0010, Pixel Error: 4.65px, Height Error: 0.000mm, Gripper: 0.0196 ✓ Saved best model (val_loss=0.1119) ============================================================ Epoch 20/1000 ============================================================ [async eval] logged video to wandb: ep000_success.mp4 [async eval] launched at step 5000 [async eval] logged video to wandb: ep000_success.h264.h264.mp4 Saved step checkpoint: step_5000.pth [async eval] logged video to wandb: ep000_success.h264.h264.h264.h264.mp4 Train Loss: 0.0031 (Volume: 0.0016, Gripper: 0.0015, Rotation: 0.0000) Val - Loss: 0.1168, Volume: 0.0008, Pixel Error: 4.44px, Height Error: 0.000mm, Gripper: 0.0245 ============================================================ Epoch 21/1000 ============================================================ [async eval] logged video to wandb: ep000_fail.h264.mp4 [async eval] logged video to wandb: ep000_fail.h264.h264.h264.mp4 Train Loss: 0.0049 (Volume: 0.0025, Gripper: 0.0023, Rotation: 0.0000) Val - Loss: 0.1118, Volume: 0.0010, Pixel Error: 5.13px, Height Error: 0.000mm, Gripper: 0.0147 ✓ Saved best model (val_loss=0.1118) ============================================================ Epoch 22/1000 ============================================================ [async eval] logged video to wandb: ep000_fail.h264.mp4 [async eval] launched at step 5500 [async eval] logged video to wandb: ep000_fail.h264.h264.h264.mp4 [async eval] logged video to wandb: ep000_fail.h264.h264.h264.h264.h264.mp4 Train Loss: 0.0037 (Volume: 0.0018, Gripper: 0.0019, Rotation: 0.0000) Val - Loss: 0.1451, Volume: 0.0015, Pixel Error: 5.64px, Height Error: 0.000mm, Gripper: 0.0441 ============================================================ Epoch 23/1000 ============================================================ [async eval] logged video to wandb: ep000_fail.mp4 [async eval] logged video to wandb: ep000_fail.h264.h264.mp4 Train Loss: 0.0042 (Volume: 0.0022, Gripper: 0.0020, Rotation: 0.0000) Val - Loss: 0.1145, Volume: 0.0009, Pixel Error: 4.42px, Height Error: 0.000mm, Gripper: 0.0098 ============================================================ Epoch 24/1000 ============================================================ [async eval] logged video to wandb: ep000_fail.h264.h264.mp4 [async eval] launched at step 6000 [async eval] logged video to wandb: ep000_fail.h264.h264.h264.h264.mp4 Saved step checkpoint: step_6000.pth [async eval] logged video to wandb: ep000_fail.h264.h264.h264.h264.h264.h264.mp4 Train Loss: 0.0027 (Volume: 0.0014, Gripper: 0.0013, Rotation: 0.0000) Val - Loss: 0.1210, Volume: 0.0009, Pixel Error: 4.54px, Height Error: 0.000mm, Gripper: 0.0245 ============================================================ Epoch 25/1000 ============================================================ [async eval] logged video to wandb: ep000_fail.h264.mp4 [async eval] logged video to wandb: ep000_fail.h264.h264.mp4 Train Loss: 0.0031 (Volume: 0.0016, Gripper: 0.0015, Rotation: 0.0000) Val - Loss: 0.1114, Volume: 0.0009, Pixel Error: 4.91px, Height Error: 0.000mm, Gripper: 0.0245 ✓ Saved best model (val_loss=0.1114) ============================================================ Epoch 26/1000 ============================================================ [async eval] logged video to wandb: ep000_fail.h264.h264.h264.h264.h264.mp4 [async eval] launched at step 6500 [async eval] logged video to wandb: ep000_fail.h264.h264.h264.h264.h264.h264.h264.mp4 Train Loss: 0.0032 (Volume: 0.0016, Gripper: 0.0015, Rotation: 0.0000) Val - Loss: 0.1294, Volume: 0.0011, Pixel Error: 5.39px, Height Error: 0.000mm, Gripper: 0.0196 ⏱ Time limit reached (20.2 / 20 min). Stopping.