# Notes


A good v2 paper would be scaling up data like droid into hands calib cotrain into embodiment train into per episode fine-tune 
I separate signal from noise
Signal is just getting the pipers setup and running, noise is the power and Cable and ssh difficulties

Finish fidex, tomorrow study for 567 in car on iPad good 1 hr studying, piper arms setup sunday, vla setup on arm and droid pretraining in parallel


Found smog check paper great go Sunday

Nice compact 6dof robot arm design  https://www.youtube.com/shorts/0acrw7P-_c0
Can also just copy sweep robot 
Might need to double up robot arm design but I think it’s basically design is making it for one servo and then just in controller set value of one to other
Vr controller is bullshit for teleop, not precise enough

Fidex is done just need to take another pass over it
Study tomorrow during car so charge iPad while at volleyball 
Sunday is piper robot setup
Any other time tomorrow just do droid setup or smovla setup or study
Crap I think I left necklace in locker room, go check on sunday 

Done eating for day, maybe some
Maybe don’t go to David’s see him net week
Can get smog check Sunday


Tomorrow 567 is do and review hw1 + code and hw2 + code (1.2hr) , weds is do practice exam and review it, thurs is make cheat sheet and review everrthing else
Tomorrow research is :
-	1) get camera exo working and streaming to Mac so we have teleop + camera re-render done, then add IK point and test IK in sim and deploy it to real to make sure we have move then wait until done moving loop done, and from there it’s same just modified simple dataset record etc
-	2) deploy trained vla and assuming no issues, done
Tomorrow logistics is:
-	1) doctor appt at 11:30
-	2) yue meeting 130
-	3) smog check at 5pm to go to Izzy’s at 6pm

Need to live my life more simple and portable
I really want my own place, even a studio, I just want to have my simple wardrobe that is easily packable, packable nutrition stack of electrolytes and protein milks, 
I really want to be lean and jacked and rich
The vehicle there for us is definitely THROUGH robotics and just being more disciplined/having fun with diet
How do we get there today? Another step through finishing PARA 
Today is 1.2 hr studying 567 (two 35min sessions), 30min (ideally) checking vla is working well, 30 min (ideally) building camera rendering teleop streaming, 2 hr logistics (meeting and doctor)
Extra time can be spent verifying the robot virtual point projection, doing IK for control check, collect dataset

I would hold back on doing calibrated droid until either can’t access robot like during weekend or done with para experiments because it’s not the highest priority, it’s an extra result. In fact the paper technically stands just with VLA + piper arms PARA, we’re close.
Diet for today is some breakfast snacks, lunch with junjie, caffeinate and light dinner to be done eating by 6 < 1700 calories and 1.2 burned with volleyball and weights workout, nice

I might want to go naked from wearables for a bit, I don’t know if monitoring all activity is great, just go based on feeling 

Meditate on beach and get steps in on walk

Lunch with junjie, finish vla deploy, then study, gym, meditate. Extra time is collecting data etc 

Doing vla deploy now let’s see and then will go back to getting simple dataset record sone and 3d projection validated then yue meeting and workout and study and then go back to doing more para
Like control
I feel like we should be barefoot more often it’s so weird that have these large blocks in between our foot and ground 
To lose 10lb in 30days is doable — just burn excess 1k calories per day which is doable (e.g. eat 1800 calories and burn an extra 600 calories via walk or volleyball) to yield a pound every 3 days or 10lb in 30 days, easy we got this!

Ok so for some reason, the VLA didn’t work well, the predictions weren’t very good, like 0% success. Why is another question. Some potential reasons: the low res 8x8 patch features are too low resolution and the simple interpolate into 3 conv layers isn’t enough of a decoder. b) vla features weren’t that good spatially in having spatial inductive bias like shift equivariance that dino learns better or general pixel-aligned semantics. c) smolvla training is worse than general vlm training. d) vlm is 10x as big at 500m so could be an insufficient data size. 

Streaming setup almost done, streaming gello debugging weird joint state, then will switch it from gello to real arm joint states, then stream joints and teleop at same time, then do in parallel picture to render (trivial just copy iPhone streaming setup for camera pose), should be done in 30 minutes

Always ask what is the 20% input for the 80% output (Pareto). For us the 20% today is just getting the simple dataset record done today which isn’t that much but enables us to collect data tomorrow, debug control policies quickly (20min) while training, and deploy! Then just a few more days until all tasks done!

Honestly we did the physical today let’s do the bloodwork Monday and do smog check whichever day tomorrow or Thursday that we are on campus

In parallel any non-robot time can be on getting calibrated droid up and running 

Ugh need to start bringing Mac charter to school now since no usbc at robot station 

So maybe cmd isn’t the right joint state to send over — fine, that’s okay we’ll figure out the correct joints to send over tomorrow, we did streaming and camera calibration setup today that’s fine tomorrow we’ll iron out the 

Don’t be in such a frantic manic rush. Getting the robot up and calibrated is constant time engineering progress outside the scope of the project. Just do one task a day for it and you’ll be there soon enough, today was streaming and camera rendering and fidex installation, great that’s enough

I think our live would be improved if we go for a 30 minute walk every morning 

Beauty and aesthetic is universal . Let’s lose 15lb to lean out and be shredded while increasing weights. Let’s start doing skincare and try the hair product 


Today:
- Jordana meeting at 11
- study for exam 1.5hr (75% of exam 2 and review it)
- correct teleop camera render for piper arm (find correct joint state to transmit, record simple dataset, verify projection — if something is wrong with the calibration, can just switch to franka arm)
- iros submit video and supplement (need to trim control parts out of old video or just redo website videos) 
- calibrated droid 30min and robot arm 20 min
- diet is fueling breakfast of bagel and muscle milk 600cal, have one robowok meal at 330pm of streak and rice and done easy 
- workout is gym at 5 and volleyball at 7, have to do smog check today or tomorrow at 5pm 

Want to front load exam work today might need 2 hrs, iros needs 1 hr. Could go to usc after starting at 1 or just study here from 1230-230 work from 3-4 and calibrated droid and robot from 5-6 that might be better

Fidex video is actually do Saturday, so let’s do it friday night after exam and Saturday

While rendering mujoco check would be good sanity check might not actually be necessary? Just need to get robot point projected onto the image 

Idk im starting to think that we just need to collect our own data and everything else is bullshit with spending more time on alignment than would take to just collect our own data

Okay I think today was a productive day for droid calibration — which is that using droid calibration is not a good idea (definitely possible but engineering overhead for unknown gain) and we should collect our own data instead, especially with ours and others intuition that smaller focused data is all you need not huge pretraining

Ok great so done with droid calibration stack and learned from it, moving on
Next up is finish piper basic tasks, then go to scaling up training with custom umi hands thing

Feeling a bit overwhelmed with exam and etc and so many social bookings. Go back to the North Star — building a home robot (broader) and building a more data-efficient learning algorithm (more local). Then just do the simplest things to get there. Droid was a potentially nice thing if it was simple, but it’s not. Using off the shelf robot arms is not the simplest thing long term (since we don’t have control over each piece of the process, which makes it complex as it cannot be simplified), but it is okay for now because that it was the culture 
So next up get the piper calibrated rendering and control and the extra time spent is just for fun on long term with building arms. Literally everything else is extra

Tomorrow is 
- 2hr take practice exam for 567
- 1.5hr piper debug for simple dataset record (either fixed it or deciding fundamental issue like incorrect calibration - which is unlikely - moving to panda) and we decide that then junjie call help us with franka but I really don’t want to move to panda because base might not be visible 
- 30min build robot arm 
- 1hr persian lesson, 1hr tri meeting 
- 30min gym and then go to Brentwood for evening

Should go to campus early as possible to reduce friction 

Tomorrow definitely need to hit just 20min of weights before going to pilates, easy.

Okay let’s wake up now and go super hard. Good job with real protein breakfast (800 cal), protein bar to hold us off until diner (200cal) and small amount of soomsoom (500cal). Unfortunately not too much cardio today with pilates so let’s walk enough and try to get another 30min stair stepper session before showering at home
Pick up two packages (tripod holder and moms clothes at usc)
Skipping persian lesson for studying (fine but ideally I’d like to just be so on top of everything and we’re being a bit too booked e.g. pilates tonight commute could have covered the persian lesson but ugh idk)
Huge lean jacked is the physique
Elon musk intensity towards the project goal but also Alysia liu having fun with it
Today I only really have two goals: get the piper calibration rendering sorted out, and study enough for 567 (take and review exam). Do review exam now and leave at 11:15 for campus to do piper arms. Everything else (e.g. 20min robot building) is just for fun
I really want to get our smog check done because it’s ben too long too ugh. Can we do it at like 4 and just drive to Brentwood coffeeshop earlier. 
Also need 20min in the gym (shrugs)

I think a lot of anxiety comes from ‘am I doing enough’ which is also amplified by ‘I should be in the lab by x time in the day’

Ok we’re done with pipers, not a fundamental issue but just engineering wise it’s too finicky,  first joint state in mujoco compared to render is just slightly off and not sure why and then cables just stopped connecting. 
Going to move on to using franka panda, either rong/junjie will show me tomorrow, but ultimately I 100% believe that if you’re serious about robot learning and not tiny research primitive demos, you’ll build your own arms and hardware in general. 

Tonight need to add example questions to back side of cheat sheet

Okay on campus tomorrow just pickup tripod and mom clothes

Ok a few things, one we’ll be more composed and less rushed during the exam so don’t stress too much
If lifting is such a stressful time thing today consider skipping it and same with smog check. Can do it tomorrow after exam, fine. 
Don’t feel bad about skipping persian this week, but we have to be better 
Tonight after dinner just print cheat sheet, stack front and back on to one side and then stack both homework solutions onto another side (30min), make sure to review and include a dual/primal formulation and solution

Just raw intuition, immediate logistics are I need to meditate 10min, lift something heavy, and get smog check done
Raw intuition on work is that need to assess vlm quality via patch pca visualization, 
 
For the vlm, I think we should actually visualize the per-patch features upsampled to the image to see intuitively if it’s a problem with the features being bad

I just feel like everybody is wrong and are idiots doing such clunky stuff in robotics and sure anyone can hide it and hire teams of hardware and firmware engineers but fundamentally it just needs to be simpler the learning process needs to be backpropoagatable into the hardware and if the experience of even just modifying the hardware or using it is bad/clunky/slow/expensive, we are doomed to lose

I think video models are somewhat involved in robotics potentially and we need to have some experience on using them and fine-tuning them — so first task is just finetuning a video model, that’s it, what’s the simplest solution there? Only constraint is has to be runnable on Mac (not trained on it and can be quantized or whatever)  and easily modifiable, just find the simplest thing that does that with e.g. the simplest fine-tuning interface and modifiable interface 

But honestly I don’t think this is the highest priority, highest priority is we don’t have something running frequently or the hardware to support that. 

Tonight is make cheat sheet, now is meditate, lift, smog check, drive to Brentwood, finish some work, do pilates and dinner, come back and finish cheat sheet. 

Tomorrow is shrugs, exam, and smog check, then Spencers 

Okay so we’re doing a few tracks in parallel with the goal of full home robot and more immediately PARA
Big Robot arm goal is work on franka panda with rong and junjie over weekend 
Para video encoder goal is a) get pca of smolvla features before and after fine-tuning (10min) and b) (longer term) fine-tune stable video diffusion model
Better long term goal is build our own robot arm
Do each for about 45 min each day until converged 

For video model let’s use video crafter — first just get the base model running for text to video, then image to video, and then fine-tune it
Let’s actually use wan 1.3B and fine-tune it with diffsynth-studio. SO next steps for video model part are running wan 1.3b in half precision for image to video and then fine-tune it and go from there

Find out where Jeremy got his haircut and go there I don’t care the cost

Most people are tired unhappy and ungrateful, don’t be most people, build your future now

To print double sided tonight just print once and flip paper around print again

I AM FULL STACK DIFFERENTIABLE BECAUSE CAN YOU REALLY BUILD SOMETHING OF VALUE AND PROFITABLE IF YOU ARENT? IT HAS TO BE DIFFERENTIABLE FROM THE USER VALUE CREATED TO THE PRODUCT 

Okay so underwear, pickleball shoes, socks are here. How do we organize portable wardrobe — need to bring up gym bag, collect all 8 Calvin Klein underwears and put in right side of bag and collect all white socks and put with new white socks in left pocket of bag. All banan republic (and the two Michael khors t-shirts) are hung on the car holder (or could be folded but better to hang)

Need to print cheat sheet first thing tomorrow, should be one on 4th floor back room
Study for 1 more hour reviewing cheat sheet and perusing anything that we didn’t understand before (e.g. check primal and dual solving, etc), buy a pencil from bookstore 

Tomorrow workout is heavy shrugs and machine rows or lateral raises
Leave lab meeting at 12:15 to eat robowok at 12:20 eggs and rice, make coffee
Go to campus early as soon as wake up, before 9

I actually think we might want to build some servo holders per-link, e.g. for full 360 rotation one, and technically we only need one side to screw on?
Maybe we should just copy sweep robotics?

Should probably start just designing the 4dof arm or even 3dof, obviously starting with 1 dof and going up

Next up for the next 20min of robot design, build the connector from base to link 1 (may need to urdf to blender import script)

I wouldn’t worry about sharing the franka panda because most people don’t even come into lab before 12 so if you can just get here early and work on robot stuff from 8-12 you’re golden

Research social work fitness/diet buckets today
Primary is work of exam at 1, 
Research is just download and run the video model. That’s it. Run it from single image to video.
Fitness is heavy machine rows and barbell shrugs and delt raises, and 35min cardio (20min stairmaster 15min raquetball), can take break in between doing work at gym lounge, from 5-6
Social is go to Izzy’s at 630 to leave or Isaiahs at 8

IM THE FUCKIN MAN BECAUSE IM NOT A SHIRKER I LOVE EACH MOMENT AND PIECE OF LIFE AT EVERY CROSSROAD

Smog check after gym at 5 actually…? Or go Saturday, it’s open Saturday just go to Izzys after
Pick up moms clothes from amazon village today after eat

Need to make month by month plan to get there by 28

I think we need to build it in blender and export it to urdf, definitely tractable 

Clean car today, meditate, car wash on way to Izzy’s 

Tomorrow gym and long walk 

Plan for finishing para and making the robot by the end 
- frank panda working on monday
- video models debugging 
- link done

Let’s go to campus tomorrow and workout there and 
And 
One meal today at ~700cal, muscle milk + snacks -> 1500 with ~1500 calories burned, tomorrow is gym cardio and lift on campus so not bad

Not sure why rotation is being bad but could 	

Ok so still debugging link rotations, almost there, from there just build robot design fully with link connectors for first 2 links, verify clamp tolerance is good (should be but holder might not be) and go from there

Tomorrow 1 hour hopefully just finish blender to urdf, then build mock robot arm, then build link by link 

I think just make a child link for position and then another for rotation and that might be good 

How can we start to sexify further: 
- biggest lever is getting leaner so keep cutting, today 200cal breakfast of muscle milk, walk 1hr, walk to gym, 
New clothes coming 13th for simplified elegance outfit, still need to go to banana republic for new jacket

Meditate 20min today

Smog check on way home today and read book / walk in meantime 

I wonder if we just train a dumb video model even not SOTA techniques as long as it’s data efficient, will it work better than vanilla dino model and be a good path forward, even if it’s two step diffusion model on top of dinov3
NEED to stop eating so late, I suspect sleep is ruined because of it

Pretty sure we should just train our own small video model like other papers do like unified world action model and interactive model too

Ok so let’s download droid and so100 datasets and vibe code a small model which I think looks like fine-tune dinov2 for the encoder/decoder structure (pick some intermediate ~32x32 patch resolution and train a decoder to go back to image space), then the autoregressive prediction 	

IT HAS NEVER BEEN EASY TO BE A CASANOVA. WAKE UP AND LETS MOVE. 

What would a more elegant outfit look like? Something that radiates a greek beach type of low cortisol and beauty with the world? Let’s go! Is it an open button up linen type of shirt? Just a plain white t?

Is there a tv show or something I can watch while on the treadmill to do more zone 2 cardio 

Should use pretrained tokenizer and do prediction with two step diffusion like that robot prediction all about ado denoising paper thing in the cross entropy probability space 

I’m pretty sure that waking up early for a 30minute walk and 20 minute meditation would increase our confidence and life satisfaction by a significant amount. I think waking up even earlier and doing work before 8am would increase our confidence further, but the balance is getting enough sleep or going to bed early enough. How can we do this. Can we sleep at 11 tonight and wake up at 6 riding this daylight savings shift. 

I don’t think we have our water and electrolyte

Ate too much especially with last night Taco Bell shoot, eat very light at cheesecake tonight
Let’s go to smog check at 445 today the one in palms open until 6, go straight to gym lift shower and meet at cheesecake then Izzys
Let’s do pretrained tokenizer verified and dino model at 32x32 res predicting the tokenizer at 32x32 res and do all patch self attention at ~6x6 patch resolution, do the prediction in the cross entropy space not use

I think we’re actually quite close, we just have to get the servo-unit rotation bit down. I think this might come down to just figuring out a transform 

Just import generate twolink.py into blender I think and connect them

When can I move out 

Sleep 1130 wake up 630
Ate too much tonight, okay fine let’s workout and not eat tomorrow
I really wonder if we should just go into onshape and do full cad

Should get protein powder bucket tomorrow instead of muscle milk I think
Always carry gum around to prevent snacking 

Don’t have to model the whole arm together, can just do it link by link in blender and visualize full assembly in mujoco

HOLY SHIT PLAY MY WAY WHENEVER SELF DOUBTING IT, ITS A TOP 10 LIFE SONG

Watch Jalaseh during cardio stairmaster please 

Wait with the empties I think we have it working alhamdullilah 

Get haircut tomorrow sides are too bushy 

How can we have more fun and be more sexy

I am so cool!!! I love myself!!! i will have robots done my way like Elon! 
How can I look more attractive like a pretty girl in terms of attracting attention — lean out to start! The clothing and hair is a good second that we’re working on!

Buckets for today
Diet/fitness: 1 meal of the day for ~1300cal. Forgot shorts ugh so just do lift in jeans or find shorts here if we have them (oh I think we might) actually. Stairmaster Jalaseh cardio and back rows.
Work/research: get video model running / working on droid and sample it on our own data, get panda api working with rong, submit 599 writeup

Today main Goa, is get the video model done
Note by tonight we’ll be feeling calm stomach again since we won’t eat a meal for the rest of the day
Meditate 10min before working at desk again
Today we’ll have a video model and franka panda running, how cool!!!
Steps for video modeo: tokenizer interface easy, dino video architecture done

Maybe just train a dino tokenizer / reconstructor? Simpler  
 
Every day we don’t finish this project and make a robot arm is a day longer we’re not rich and successful 

Do 599 submission while training
Sleep early tonight 

New video to watch every morning before complaining: coal mining https://www.youtube.com/shorts/flS_YK92AgM

Need to try the few step denoising as well for this rgb prediction 

Really don’t eat tomorrow, < 1.5k calories, ate too much today but working a lot so wtv 

I’m thinking let’s actually start with the unified video action model for starting 

Need to 3d print new panda base marker and also print another marker or can steal it from piper 

Ok so next up for training tomorrow morning a) verify tokenizer works with MAR, b) implement the spatial token prediction with per patch diffusion head 

Clothes should be arriving today

Today goals are get video model fine-tuning on droid and get panda fidex printed for data collection server script and meeting with yue at 130
Barely eat today, mostly fasting 
New fashion coming Wednesday and leather jacket soon, 

While droid training 340-420 5min grass gym pulling and shower 445-530 rong then TJ and cook dinner


=Setup server 

Fk I keep eating too much ugh
Done eating for day with pancakes and eggs, 
Main goal for today is finish video diffusion model, that would be big
Tomorrow on campus we finish panda data collection 
Friday and sunday we collect and train data
Saturday we prototype the robot arm in sim (building links)	
Volleyball tonight for 1k calories. 700cal breakfast, 200cal muscle milk, 400cal light dinner snack for <1500cal

Honestly, the uva wasn’t so bad, maybe we should just go back to that and scale it

Note the video diffusion part is Kind of disentangled from the data efficiency part so we could collect say like 200 demos for the video diffusion to fine-tune, it wouldn’t be so bad

Eating too much we need to be fasting I feel way too full which is horrible for trying to lose weight wtf
Video models seem to be hard to fine-tune / not data-efficient, maybe we’re not respecting them in training them correctly (examples used 16 80gb for fine-tuning for  2 days with only just 200 demos)

Done eating mostly for the day no more big meal, just a light dinner snack, I feel bloated and my stomach is distended I don’t like this feeling, 

Be patient. Think about what you’re saying: video models can’t be fine-tuned. That’s obviously not true. I think the current true reality is a) we aren’t great at fine-tuning and b) they also may not be data efficient or worth the headache of large-finetuning requirements compared to just collecting more data and using a good visual encoder 
Let’s recognize the frustration and this isn’t our most comfortable place to be in large model land but let’s be more persistent, and try other models
Be thankful for coding agents, we’re currently getting scripts for running 3 different video models including cosmos which is very mature of us, we’ll get each finetuning all of them before going out to valley 
We tried our hand at making a simpler robot video prediction model. Doing so is harder than we thought, not much success, we don’t have great intuition for training them


ALEXANDER THE GREAT CONQUERED THE WORLD AT 25. WHERE ARE YOU NOW. MOVE WITH MORE BOLDNESS AND AGGRESSIVENESS>

Hf token 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

We have to pick it the f up, like now. On all dimensions. I can’t even arouse my gf and I don’t blame her, we’re being lame asf. Diet wise: need to effectively fast as much as possible for the next month to lose 10 lb — if you want to snack, you’re not caffeinated or wired enough, have a caffeine pill. We need to build the robot and finish the project by the end of this month — we want a week buffer for debugs to finish data streaming and collection (ideally I think the gello setup is simpler and can stream that while controlling more easily probably) by monday (we work on campus sunday for sure and maybe sat) and for robot arm building have a prototype of the arm links by friday and print the mounts for debug on friday Saturday and sunday to have a prototype before the end of next week

Persian lesson at 11 can do it at coffee shop or after 
Steps to improve life dramatically and become the person we need: finish this months big todos, lean out 10lb, finish wardrobe (consider the Levi button ups with wife beaters under too, done), sleep and wake early  
The first robot iteration should be ugly with big clunky links, that’s still a robot.

Have chatgpt make a script using the coordinate empty transforms that produces the exact mujoco file we see here, automatically doing the inverse transformations in code instead of the mujoco inverse files because they’re ugly and cumbersome 

This robot arm is good enough to start with — we validated it (though might want to just copy and paste the so100 gripper urdf into ours (can give the gripper an empty just like we did here to validate but should be easy)
Then basically just copy the so100 setup in terms of link lengths to reproduce it, then we can change it from there

I think the uva video model might be good enough such that we can finetune it with 200 demos and have it compress them and generalize well

I am awesome I feel great about myself I am sexy health radiant and amazing
Sooshbrah energy let’s have fun and be sexy 
Elon musk intensity with Steve Jobs excitement 
I think the current setup is actually just not eating / fasting for the next month. Starting with dotty esepecially becuase I feel mostly full. Today diet is coffee, muscle milk, a few bites of sushi at night — really make it a game to eat as little and more importantly as infrequently as possible 
I am the best in the world. Why would I not be, I’m here for limited time. Let’s move
Can’t even arouse my girlfriend which is not her fault, don’t be resentful at her, that’s our bad, let’s tighten up with everything. 
BIG MOVES for the day:
- video models mostly done (UVA seems to be the best thing we have, make a stab at finetuning it for the key grip prediction and see how it generalizes and test it on new model)
- robot urdf fixing from script
- panda streaming setup and data recording (camera rendering and joint state streaming, can finish that before meeting even)
- print first robot link while waiting for things

See about finetuning keygrip video model on dataset and seeing how both generalized a new scene we haven’t uploaded to the server
Whenever you want to eat, you don’t need to eat: ask “am I thirsty, not caffeinated enough, or do I need to take a walk”, you have enough fat on your body to fast for a month, so you seriously do not need to eat. 
Anytime you are ‘hungry’ and have checked off these boxes, just recall the frustration of not being able to arouse your woman (not her fault at all, it’s all me).

Okay was really on edge, needed food. But now very confident we can go rest of day without food, just a few bites at night, nice. 
How can I be the happiest most energized person today. Lat pulldowns and 45min radio today (1 hour, 30min stairmaster with Jalaseh 30 min raquetball).

3pm lift cardio  and meditate, 
The only fashion flaw today is that it doesn’t match the weather — a white shirt or wife beater would’ve been much better at 89degreesl tomorrow is 90 so dress accordingly too

Need to look at the video diffusion loss setup, either we’re loading the wrong checkpoint or whatever it’s not generating the same level of videos, confirm with uva finetuning it’s good from the start and then go from there

Fck still eating too much. Honestly just take a step back and realize what works — if I have a robowok good meal around 12, I don’t feel like I need to eat for rest of day. So just tomorrow have robowok at 12 and then at Mexican dinner BARELY EAT I FKN HATE BEING SO FULL ITS LIKE BEING TOO DRUNK UGH

Tomorrow panda debug, finish printing and servo holder test, para video model debug loss
We have been feeling so full and that feeling makes me self conscious and anxious 
Today actually we had 500cal breakfast and will have 600cal dinner at Mexican (chips+guac+2 tacos), have a caffeine drink or green smoothie if you’re really hungry but I don’t suspect you will be, 1 hour workout of shrugs and then 45min cardio (20min staimaster 20min racquet)
Should we stop at banana republic and get a leather jacket  

Let’s just print the board at home and finish data collection and visualization with mujoco side by side today, that’s it for now, then tonight we print the board and start printing the serve tests, and for video model side let’s finish debugging para and a) train for longer on droid and b) train on smolvla for the so100 experiments 

When home print panda camera base holder, servo holder 
Go to cvs for hand brace and desensitizing wipes
Can finish servo print debug and new robot prototype over the weekend while running video models
Need to scale up training and do something useful asap, start with laundry, need good teleop device 
Bring moms clothes home today

So today we continue working on video modeling, get panda data collection working and visualized (rgb and joint state with corresponding mujoco render alongside it), and extra time design new robot (lowest priority)
This weekend we reprint the panda base (either on campus or at home) and print the servo holder and clamp clearance test (both can be done Saturday) and revisit on sunday if extra time 

Just have chips and guacamole tonight
MORE ICED COFFEE.
Grass walk/meditation 10min MANDATORY
20min lift then check computer work 20min then 40min workout, need to leave gym at 6

Okay so I think we should move to streaming joint states into and out of Mac — here’s why:
- ros workstation is annoying (on server so can’t visualize videos e.g. natively)
- realsense cameras are low-quality and finicky to set in data recording script (need to specify e.g. serial number and sometimes just goes out)
- recording script has hyperparameters in config like fps and camera but still annoying to do
Overall the workstation is nice and clean and the pandas are responsive with the gello and everything connects which is great, but we’re going to just abstract them onto the Mac so all our experiments are unified anyway

Binge ate hen got home fck. Barely eat tomorrow, eggs for brunch and just .7 meal at anajack 
Need to order new license ugh
So new spool coming tomorrow, print both when they come 

What’s next for video model part of paper and vlm? For building robot?
For building robot is just print and test servo holder and in the meantime building a virtual arm for it

Go to drink tomorrow night?
Shave clean tomorrow night
Sunday going to campus to record panda render streaming dataset to Mac

Pretraining on point tracks and perhaps cross-entropy point tracks on just gripper prediction, might do much better than video pretraining. We could automatically annotate rtx dataset as well as egocentric hand datasets 
This might be cool as a ‘large scale pretraining dataset’ type thing 

Token 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

For vlm try paligemma, smolvla makes sense why didn’t work well (probably garbage features and 8x8 res)
Need to do training engineering overhead, need to make faster and multrigpu 

You’ve captured the key points well! In summary: First, on the video model side, you’ll keep fine-tuning and deploying the wrist-focused approach. Second, for the VLA, you’ll annotate RT-X with point tracks, train a vision-language model (with DistilBERT as the text model), and also train a purely vision (point-track) model. You’ll compare their generalization. Lastly, yes—your lightweight language model will be DistilBERT. With these steps, you’ve got a clear path forward!


So basically for vla instead of using pretrained vlm style we’ll train one ourselves like openvla combining dino features with tiny text encoder like distilbert (just provides a clip-style embedding)
We’ll annotate the rtx dataset with sam3d text of “robot gripper” and prompt the model with the start token to predict and predicting the next N~30 frames or something 

Just use computer in car with hotspot no iPad
Get hotspot device 
What’s beautiful male aesthetic like a beautiful fit woman - it has to be a nice physique and glowing skin with being well groomed. I think we should focus on getting learn while building big muscle
Video model part seems fine
Wow omg the liber train model with even just 2500 steps on rtx dataset is actually quite good motion

Maybe should just run point tracks on big grid or segment by optical flow and 

Another time could’ve used hotspot was today and also tomorrow

Gym tomorrow (shrugs or dumbbell raises) after volleyball on campus

Order it on phone with dad monday

Can print paper aruco fidex board and stick with hot glue or tape 

Basically done eating for the day :) properly nutritioned and worked-out
That man at the mall with stripe shirt slick back hair and slacks made such an impression on me as a strong and confident man, I’d like to emulate that air of composure and collected confidence as well

We have 15 days and I think this next project and robot building is basically a layup, which is great because we will have a bit of buffer for the inevitable hiccups that come up:
- for robot learning, I think policy learning should come in the next few days on panda which is great (robot-camera aligned data collection streaming today and maybe control too), and robot building is first just building the holder and clamp then a fixed body with aruco modeling and proper servo min/max calibration and 

Should change umi grippers back to normal for better modeling 
Servo holder should have marker so no design changes 

Ok so overeating again, tomorrow really don’t eat fast all day
Print clearance servo while finishing panda camera setup
Train shoulders tomorrow lateral raises for 15min and cardio

Today done eating had my big meal at robowok, jalaseh cardio 30min and lifting 
Need to stop hypersexualizing, the Corren’s marriage is a great dynamic to appreciate 

Cvs hand case and iwipes 

Need bigger markers (make it 3x3 board with 2x larger marker length)
Also need better alignment to panda. Not sure why but exact alignment is more finicky than I thought, can press down on board to get better alignment fit, shouldn’t be able to press down on marker e.g. 

Fk ate again at night but did good job otherwise. Tomorrow same biggish lunch tiny dinner and dessert after 2 hr volleyball

Okay great so now we have a good method for joining two links that is elegant and easy. Next need to print to reverify the clamp, then print our new link connector (do this while working on panda, then need to calibrate servo correctly (Set min/max and I think we can even just do it in the servo board gui), add aruco positions (ideally in same consistent position per link instead of defining unique design per link) , and move from there, on pace to finish by end of month

Might want an even bigger cam board tbh like 6x6 but double resolution as before, print it I think why not 

Okay great panda camera rendered and dataset recording, nice 
Next highest priority is control and dataset processing/training 

Highest priority for sexifying is still gettinging learn while lifting. Today and tomorrow we have volleyball with izzy so can get away with Arely eating, 600 calorie main meals , today is 1k calories on campus and done eating before volleyball already so 1k eaten <1k burned nice
Same tomorrow 

Wait so were the pandas actually different or can we use the one with the gripper
Let’s use the one with the gripper, also let’s keep the rotation upright/cosntant always still so no prediction rotation necessary

Cool is not even due until may 26th, this was a self-imposed deadline. Not bad but don’t be in such a frenzy. Take the extra time to build  a great paper beyond just basic motion primitives and ideally including your own robot as well

Today mostly done just finish keypoint reproduction on collected dataset
Tomorrow will work on control of robot — first on Mac click on point and with height do IK to get joints and reproject it (at first with GT state to compare against) and keep rotation of link 7 and 8 fixed 

Also keeping that in mind, first arm should just be 4dof or even 3dof


Coffee pods are already orderer 

Definitely build bigger board, could appreciate better alignment 

Toyota get airbnb

Watch invincible 

Diet/fitness bucket is robowok Jeremy lunch and izzy dinner, deadlift shrugs at 4 and vollleyball at 9
Work is get control working on panda — need to first command panda to arbitrary joints, then setup streaming from Mac to panda (can just set up dummy script to replay trajectory and waiting for each to finish before sending next and doing it in a sparse sequence for simulating longer gaps/pauses and using pdb next action sequence send)
So main work goals today are get panda control working (streaming control from Mac to panda) and collect a tiny dataset for it
Robot arm work (30min) is literally assemble the full arm (not that much work) also should we add so100 gripper now or build our own? I guess we can easily build our own in blender btw, it’s literally just a modified geometry for the so100 style thumb 

For the robot arm assume right now we just want to keep orientation fixed like we are for experiments (it’s an interesting question how much you can get done with this 4dof arm, but more importantly for now I want the simplest first prototype thing). the link structure is prototyped, just design first link for printing now and then do rest while printing/waiting 

Okay great so panda streaming control is done / derisked, next up is collect even small dataset, train on it and deploy, 
Next up make IK reach analysis with urdf before making holders, once happy with robot, make connectors and be done. Should also add it side by side the so100 and pipers for size comparison 

Pickup pods and moisturizer right after dulce here

Note also should render arm with aruco onto table before building it for size understanding and animate it through the the reach sequences 

Leaving for volleyball at 7, do deadlifts at 6
Don’t think we’ll need another meal, feeling full still (note body was too full fro past days not just one-day eating)

Collecting quick data tonight and running it (30min), deploying model tomorrow, main point is just doing sample inference infrastructure setup on it not anything crazy

NEED TO WATHC MORE SHONEN ANIME WTF WHERE IS ZORO WHERE IS SUPER MASCULINE SUPERHEROS 

Fit stylish people hang out with fit stylish people btw
Start working on math homework tomorrow and also find airbnb/apartment for the 3 months this weekend 

You need to have more shark eyes and be more mean like those people antagonize as being mean

Ok there are reachability tests but for now I want you to just build the simplest thing, then can just copy the link lengths of the arx arm 

Today the main goal is just deploy a trained panda policy on the panda
Then just recollect new dataset, train and deploy on simple task. That’s it, done. 

Amazing that just 2 training samples has low validation on third sample, should somehow include that in the report, I think basically that’s showing dataset size wrt performance in [5,10,50,100] in [ours, act, ours pretrain, act pretrain]

Ok so we’re a bit burnt out my using the panda. That’s okay. We actually want to make this the main testbed for our experiments so we have to get a comfortable setup within first. Currently we do not have that with two computers, so let’s switch to using just the panda computer. Ssh the codebase over or even just pull it from our lab server, that’s fine. I think we should switch to using the realsense camera then, we definitely need to find the correct panda model not this one with the missing end-effector, we need to be predicting full rotation so let’s get that IK solve it shouldn’t be very much, can just . In summary we need to build up the current trust stack on the panda — we can render the robot onto the panda, we can do IK confidently, camera aruco detection (is the real sense really that bad, run the corner detector on it, might need the larger board size for it and to calibrate it once easily), we need to be doing rotation prediction now just discretize and bin it (draw the axes projected onto it for gt and prediction). This is definitely doable 
Stop forcing the timeline, right now just get very comfortable in working with the franka panda, take all of tomorrow to do it with no timeline on getting research out anyway. Can even go back sunday if we want (Saturday might be hard, just do math homework then)
Barely eat sushi tonight btw 


Robot design: good to see it against the piper, should pretty much copy the piper link for link wrt size and rotation for rotation. Right now e.g. should be moving link 2 right on top of link 1, e.g. 

Tomorrow is heavy deadlifts and shrugs for 40min + 30min cardio (separate sessions) since no volleyball, Ethiopian at 6
Have libero agent draw the line from the base to the end effector at each timestep to verify height coordinate projection, then add rotation of end effector drawn as axes, then debug ik from those three properties with mink solver, proceed train video model and para etc 

Today:
	⁃	- libero model should be training (and test we can deploy model) : PARA version, video model version, and point track pretraining, vlm training on multiple tasks
- panda computer PARA infrastructure — get comfortable getting and writing 3d robot state using just that computer, visualizing keypoints, transferring codebase
- build arx arm 
- 30min of math homework

Also wonder if we can add distractor objects/lighting/viewpoints in libero as well

For the gripper thumb we don’t need an extra link, just modify the holder to include the gripper thumb as well fo the last link 
Okay robot looks good and is about the same size as the arx, nice. Next robot building day todo is let’s build out the link connectors (will add a rectangle as base then build edge loop connectors to them

Read class lecture notes today and understand them
Unfortunately holder wasn’t the one with clearance, think I found the right file but need to redesign and reprint with swapped in correct holder

For PARA maybe the move is like make a git repo with it that has sub-folders or branches for general training, libero, panda streaming, video training, vlm, etc. And we can always do it in the same place where the only thing that changes is the system cuda vs mps device and the data paths

Next up debug libero output, push to git, run it large scale on server, deploy it in the sim, meanwhile get video prediction fine-tuned for our libero episodes running with uva libero model, train our vla on the libero dataset’s language task conditioning, then always same constant progress on getting running on panda and robot building as well 

Done debugging but output still looks coarse, can we use multiple levels of features (I think we’re predicting then upsampling which is wrong) for finer grained prediction etc, scale up 

Should make first version of arm same shape but smaller to more quickly iterate before scaling up 

Mix fail/success. Ask claude to make new branch and try rotation/gripper regression instead of cross entropy binning (sigmoid min/max) and explain huge 0-1 step loss function decreasing 

NEED to at LEAST read lecture slides today. Fun volleyball, meditate 10min needed, gym and shower there at 530pm to be at maizes at 630, 

Wait Toyota start date is 5/26

T

I want claude code to open a new branch or make a subfolder or just new model script maybe is even easier on adapting depth anything v3 or moge for our heatmap prediction since they have already learned upsamplers, are built on dino, and understand geometry 

Should 

Paper title should be something like PARA: Pixel-Aligned Robot Actions are more data-efficient, robust to OOD viewpoints and object positions, and more natural robot-policy video-model adapters 

Do viewpoint and other augmentations in the libero dataset?
Not a huge fan of today’s shirt choice but was in a rush which is also not good but just focus on sleeping more concisisntely 

Ok a bit overwhelmed and started off o the wrong foot, it’s okay, you worked so much and late yesterday, count that too. Any given day’s to-do list is arbitrary, just track signal over noise. Tomorrow morning finish collecting and processing dataset for training before meeting

new servo board too, dmv and bloodwork this week, zoom with si letter message him tonight
Consider mullet fohawk like haircut with nanobanana
order sticky paper and shower sandals to usc amazon, wake up at 6am if you’re feeling low on time i’d rather be a tiny tired than not doing enough
new servo board too, dmv and bloodwork this week, zoom with si letter message him tonight

I think waking up at 6am will solve all our problems even if a little tired won’t be so anxious about nrunnign out of time doing it starting tomorrow

6am work enables time for workout, getting panda robot work before 12pm before people even get into the lab
Goal for tomorrow is first thing panda record dataset and parse it and train it with claude then deploy it, then yue meeting, and lunch with junjie, work on 567, add first servo and calibrate and control it, model it in 3d with marker, make meeting notes, deadlift, volleyball 

Should include libero results in the experiments because also it’s standardized 

Okay 6am was too early of a jump, tomorrow do 7am
I’ve been feeling too full. I can tell when I have a calm moment that I feel th heavy weight 
I want to stay light and bouncy in my mind and body too. It doesn’t mean starve, it means eat less frequently and keep foods nutritionally/calorically dense. Today is robowok then brownie and done since we ate too much last night 
I don’t like starting the day late and feeling behind. I want to feel on top of things. 7am wakeup tomorrow. Part of the hard part I think now is that we have mostly robot work to do left, but can also still do our math homework, viewpoint experiments in sim, 
Bloodwork this week and dmv when? Do DMV tomorrow 

Need to dump out everything we’ve been thinking about 

I feel like im already anxious and at capacity with phd work idk if I have time for all this gf and help stuff 

Logistics:

Dmv tomorrow, bloodwork Thursday or friday morning, wake up at 7am for the next week. Those are the wins. 

Today is: advisor meeting, panda dataset collect and train and deploy, 

Cool v2 of paper might use multiview images with DAv3 with para output for enhanced 3d-ness of output with using explicit pointclouds  

Spend 30min on finding tri apartment and email them about start date
30min on 567 hw, 30min with lab group

Okay we’re feeling anxious. It’s okay that means overwhelm. I’m going to simplify for us here. We’re going to do panda data collection and train now, a logistics block with an iced coffee at 3, workout at 4-5, meeting at 5:15, then more 567 homework and done. 

Should we make daily fresh juice part of our daily diet/budget?

Still should be using larger aruco board btw but not highest bottleneck

Do phd class registration now since we don’t want to miss priority classes again

Need to wake up earlier I hate having deep work sessions conflict with random bs when I’m already anxious, later in the day should just be logistics 

Ok so no conflict with Zoe wedding need to just fly out the day after 

I think the video generation part might be an issue with the video model generalizability and we might just solve it with it being a better video model and even perhaps offload that part of the project to e.g. junjie or some intern who specializes in video generation 

What would it take to be more confident in my body? Better haircut (see Jeremy salon)? thicker neck (add neck curls)? Leaner (mind your food volume better to not feel as full)?


Add logistics as plan classes 

Want to ask in Thursday chat — is there anyone interested in doing the video generation part, perhaps junjie, or a masters student 

Ask clause to check the weights on the para loss and the video diffusion loss, maybe cap para loss aggregate as same as diffusion loss, might also want to scale min/max of all of our outputs to be [0,1] in redo 
Assume this is a good start but there are some implementation bugs to investigate and like the [0,1] scaling and proprioception 
Next simulation setup is need to verify act can overfit on single and few traj and single task eg, then see if you can augment libero to generate new viewpoints also will be useful for 567 but will be able to study what distribution it generalizes to and how well vs how much finetuning needed eg over full distribution covering using n centroids viewpoints from sphere, then same for new object positions but might not be trivial to shift training traj for finetuning unless we don’t rotate just shit constant amount, want to see the viewpoints as pointcloud over it 
Tomorrow for sim we redo all baselines with proper taking care including video gen (starting from stride3 checkpoint is fine) redoing uva+para (e.g. should evaluate video model in autoregressive mode without using para and seeing if generation is less OOD/blurry), do novel viewpoint generation and fine-tuning (will be generating extra parsed data)

Also with spare time should design other connectors of robot arm and add arucos, but that might be better for the weekend task
Panda work (1.5hr) is collect data now that streaming and parsed data collection with proper panda setup is derisked. Start by collecting ~20episodes on very uniform task e.g. cup always in same place, train model, while training get actions deployment by taking GT rotation to verify joint action recovery and action deployment 
Logistics is book the airbnb with the right dates before it gets taken and book flight as well, ask chatgpt to discuss papers and what we need to do for dmv to not waste our time, could also go Thursday if easier, and have gpt plan next classes to register for 
Homework is 40min homework and 15min add pics to slides
I also want you to train a version of UVA from scratch that doesn’t use diffusion but rather just the direct regression of latents. I suspect the per-patch mlp diffusion isn’t really doing anything and the vanilla regression will yield the same and with better training dynamics
Above sounds like a lot but it’s mostly just typing it out for claude to work on and making a list to check back on, perhaps with gpt, or maybe the separate claude manager who better understands context
I want to be the most myself version of myself in the loudest fashion
Logistics for tomorrow are go to dmv or get bloodwork, I think do dmv unless you want to skip breakfast/fast or can fast friday morning, volleyball is at 8 so we can 
Do Jeremy’s presentation 15min too
Alarm set for 7am tomorrow get up and get to sim work 
Get haircut tomorrow mullet style 

Haircut today, talk to chatgpt about what need to do 

Today is 
Diet/fitness: (burrito, muscle milk, espresso), (volleyball, lifting)
Work: plan optimal experiments, have claude execute it, panda 1hr collect/train/deploy, extra collect more and deploy
Logistics is haircut, book airbnb, ask chatgpt about dmv papers todo

Next steps after: add bigger aruco board for better calibration, finish other links in robot arm to print

What is the optimal setup for the paper:

Need to have ACT vs ours comparison (ignore others for now)
Need video model and VLA comparison (vla might not work / show less improvement but will be good to show) with ours vs ACT integration
Viewpoint distribution and object distribution needs visualization of coverage (for object position can perhaps shift just one task vertically x and z and can shift eef trajectories accordingly like making a custom simulator)

Next robot step, last gripper Is actually supposed to be the gripper hand which is a move involved design, shouldn’t be so tricky but it’s just not the standard link connector 

No wonder you’re feeling overwhelmed: you’re trying to :
- learn panda ros interface and train on it
- train video model for para
- vla debugging for para 
- test viewpoint and good object position robustness
- learn skills for claude code to the best ability
- build your own robot from scratch
- test different backbones and etc
- + 567 Homework and logistics

Just slow down 

Charge phone at oakwood
Separate into buckets, currently am just not good at training video models yet, that’s ok, trying cosmos policy now

Should recheck move then act first as move 2 of same actions, should be exact same then replace with same first then gripper second, definitely a bug rn
Break down into buckets of todos

Ok don’t rush, is this the most you can do handle?
Do you want to skip the persian lesson? It depends - is that important to you in the long term? You don’t even have a pressing deadline, just take your foot off of the gas and have more fun!

Can just fine-tune with 4 latent frames easily because they’re repeating same frame 4 times 

Persian lesson, group meeting at 12, tri at 230, make slides at 2:15, panda from 130-2:15 and 3-3:30, logistics from 3-4, homework from 4-6,  build gripper design at home tonight, bring servos to school friday to assemble
Logistics is plan classes, book airbnb, chat with gpt about dmv 

Ok fine messed up with keycard, lesson need to recollect, it’s ok and the 30$/week is for once a week plus an optional second buy in 

Can we get a haircut today?
Refresh
Logistics for day:
- Go walk back to get id card (extra steps, nice)
- meeting at 12-1 for 567 basically, fine 
- logistics are book airbnb, chatgpt schedule courses, chatgpt ask about dmv, haircut
- work is panda train/deploy at 1-2:15

I’m wondering why the performance is so low. Maybe we should be using the wrist camera? I feel like it would be the low-level grasp easier since we have good high-level understanding and it’s just the low-level grasp that’s the issue?

Haircut and shave tonight 
Deadlifts today, work in gym, then back to cardio 20min stairmaster 

Need to sleep enough, I don’t like the feeling of waking up early and stressed out 

See if can get second monitor from somewhere to panda station 


Need to send furnished finder requests TONIGHT

Okay so another day in the panda lab we discovered that streaming isn’t fast enough
Let’s just try using the real sense camera and being on the linux box — we’ll get rid of the streaming issue and we already have a git codebase anyway, can make a panda training folder 
Tomorrow goal is just reproduce training setup with real sense camera — e.g. we need to calculate new intrinsics, just waving it around the board for the median should be sufficient, definitely doable tomorrow and that simplifies the codebase a lot to not need any streaming outside the network, more mature and better, and probably easier to get the wrist camera realsense as well. 
I want to see if wrist camera makes the task ~5x more reliable/robust. That would be big and convince me to always use a wrist camera and that it’s worth the overhead.


Stop taking advice from izzy on eating. Bullshit. Stop eating fatass. Tomorrow is brunch and a protein dessert from TJs or muscle milk. That’s it.

For now just supervise wrist and third view cameras to get intuition on what heatmaps look like on wrist cameras 

Gotta clean it up and start living more sexy. Listen to more don toliver and increase sexiness 500%. 

Perhaps in training we should predict the NEXT gripper frame instead of the EEF + gripper location? Perhaps it should predict the pre-and-post move gripper values?

Keep debugging wrist view. May not be that important/useful and we have a different bug but worth it. I still feel like the move-then-grip is correct

I think we should cancel persian lessons, not that it’s bad, but we just don’t have the capacity for it now

Today is:
- logistics make furnished finder account (also ask if they have some good spots near Culver for us too), haircut, talk about dmv, plan classes (1 hr)
- homework is finish 1.3 and start 2
- diet/fitness is 1 meal for today, 160cal ice cream at night, lifting and cardio 1hr
- work is: panda real sense data collection finish, 
- schedule is: 11:20-11:40 warmup, lab meeting 

Maybe some wrist view conditioning is easier? Like maybe do the vertical projection we saw in that other paper for the vertical dropdown cue, maybe we just add the wrist view as second view with cross attention not using it in the actual prediction?

I wonder if we can formulate as for each height bucket project onto second view and sample feature there to regress height votes at like 16dim, and when no second view goes back to learned height bucket embeddings, volume would be 64x64x16hx16, doable 

Try move then gripper with teleport approach

I need to leverage claude more. I wish we could do tasks as efficiently as it. Let’s try.  Lean on it as much as possible .

Ugh aruco detection is still finicky even with the even_larger board. Maybe we should print even larger boards, like 6x6 with the same size of what we have for 4x4 now, but in any case print it out on the papered run detection before 

Text Melika I can’t do lessons anymore 

Just look at muzi she is fashion MAXXING I want to be that level of clean and fashion maxing (I think the current shirts are okay but we could maybe incorporate better hair and some button up polos/linens, look back into the linens)
I don’t think apartment is highest priority but see about the furnished studios website

I think the 5x5 board is fine, and to get over the last bit of flickering, similar to intrinsics let’s record it for 50 warmup steps and take median to grab a camera pose. 

Fck we keep eating too much, need to I’ve my body a break and lose weight. Tonight is 20min stairmaster 20min racket after 20min lifting and 20min meditation

Note I don’t think we need to wait for iPhone and might not need it, I think taking the median rotation or using multiple frames for a solve (they might actually be roughly the same thing but ask claude if we can do an explicit multi-frame solve ) might be good enough

I think we need to lift weights for longer/more often in general, not sure if diet/sleep/lifting-less but I think we might not be accelerating muscle growth, could also be that not doing push
Think we should use te so100 gripper 

Elon musk simplify, delete, prioritize — video models are not the priority, drop it for now, drop even rotation, 
Priority now is get camera done for training on desktop, train our dumbest policy, like pick from same location
Priority for robot arm is 

Gripper is already aligned and should be pretty easy, can finish it tomorrow 30min and print, finish arm on monday even

Today do hw prob2 and design gripper, tomorrow do panda cam and calibrate servos on campus

Can also get started running OOD viewpoint and object positions evals btw, luckily I think all the code we use for 567 project we can use doing the project too 

Wow actually don’t even need cost volume it seems though it is a good idea to keep in mind, seemsliek it may not be worth the complexity now 

Robot is well-defined! Ugly, tacky, scrappy, which is so fine. Can easily improve the shapes, just get it done so that the robot exists and we can iterate on it! We can finish it even tomorrow just print while doing calibration  after we do multi-frame camera solve fix for data collection and deploy, easy!

Do those for robot lab work tomorrow from 4-7, and next at-home work is viewpoint and object robustness testing 

So for viewpoint testing can say that for this simple task, here’s a viewpoint distribution: if we train with this subset here’s the performance on this other subset

Should get iPad keyboard so can code at the gym with lower friction 

Act working no surprise on libero and god to know we don’t have a bug, now need to test data efficiency and robustnesss to OOD viewpoint and good object position so we create the new simulator modifying libero object positions

Can also just throw out libero if we want 

Viewpoint distribution and object distribution test should be 

Good reality check from x today how often do we need this
Remove x from all plans, we should be optimizing extremely aggressively for our own vision not for x, fuck, seriously like you should live ON CAMPUS. 

Do multi frame solve but should be using at least 4x4 board solve as well 

Should just redesign for 4x4 or 5x5 grid and more reliable mount 
Such bs not using arms that we build/modify ourselves, hopefully this is the last time

Design new mount for panda tonight, 

See if we have sticky pads at home tonight

Def don’t sleevover monday/tues. need space. Need to recalibrate. Go back to campus after dinner from 9-10:30 to reprint etc? Meh just work on homework now and tonight and servo calibration script 

Try not even using glue, just place it carefully onto the robot and calibrate it once, or even because it will likely tip over with current design, try hot gluing just the outside/edges, to avoid any thickness in between the mount. See if that is stable enough.

10min meditation at 10:30 tonight

Ate too much today fk. Tomorrow diet is gilbraltar, robowok eggs and rice, salmon rice dinner, 30min stairmaster 30 min lift (rows, hammer curls, neck)

Tomorrow we:
logistics:
- book airbnb at end of day if furnished finder doesn’t respond to us 
- get haircut
- make dmv appt
research:
- panda calibration (print more snug fitting calibration board with 5x5 and do multi-frame solve)
- viewpoint and ood viewpoint/position libero: first make an environment with just the pick and place objects, translate the objects and the EEF trajectories accordingly
Homework:
- just finish fashion mnist homework, it should take 1hr

Need to get linen shirt before Thursday, go Tuesday to grove in evening

Fix libero trajectories with new obj position trajectories, I think the way Is basically take grasp position, go linear path from start to grasp position over n steps 

ROBOT IS GOING TO BE BUILT TODAY! How exciting we’re completing our goal for building the dumbest ugliest robot prototype, next month is about refining it to be a ‘good’ robot (more aesthetic, bigger, perhaps double motored, calibrated)

Lean on claude code to do more autonomous larger-scale work, it’s so fats 

For libero I think taking any EEF trajectory e.g. demo0, will succeed on all episodes, that’s why it’s a dumb sanity/task

Videos look great, I think we can train para on it today and even deploy it, before our meeting tomorrow, just fine-tune it on the zero-rotation dataset, I think way we’ll do it is save intermediate timesteps and stack them, since small episodes (~50), can just stack them 
Prob should evaluate it on our own libero pick/place thing

So today is:
- homework 567 and submit 697 submission
- logistics is book airbnb if furnished finder didn’t respond and get haircut
- research is finish panda camera toolkit and train/deploy 

Ask yue for second x1 printer tomorrow 

Maybe we should just use franka panda default camera calibration script? At the point where I don’t care if it’s ugly/clunky
Maybe we should just use the other franka panda where we know the droid camera calibrated setup, though 
I think we can do this with claude. We actually already have a rigid board for the 5x5 that we can cut and glue onto the EEF for now to make the scripts, later on can make it work with just tape but should be fine for now 

Overwhelmed? Let’s do this, 30min knockout robot calibration, go meditate 10min on grass, 40min  get haircut at 4, workout 

Research fronts (should move this somewhere external with different buckets):
Video Models:
- finally a good working video model (we hope) on libero 
- need to run PARA on it (will wait for OOD object pos libero to fine-tune it from general libero to ours, then with real panda will fine-tune from libero checkpoint to DROID to ours)
Robot making:
- forgot servos at home (hopefully, otherwise need to find them because they’re not at school)
General libero testing:
- ood obj pos data generated (same for generated viewpoints), need to generate both and train on both
Panda calibration:
- going to just use off-the-shelf franka calibration, seems to be most mature thing and fidex seems hard to attach to big robot accurately (need to really build it into the robot)

Will we be in palms for haircut wednesday? Can get haircut weds morning if we want to then instead. Make dmv appt for tomorrow and do bloodwork first thing in morning tomorrow

Should make daily todolists/tracks with gpt or claude in a ‘todolist’/self folder

Next steps for para:
- generate OOD object position and viewpoint datasets, ask claude to evaluate ours+act on them with different amounts of training data
- fine-tune svd on it (on the all object position dataset where object positions = test positions not really focusing on diversity here)32


Ood obj posi is rendering for testing, for viewpoint need obj positions so regenerate with ~9 different object positions per viewpoint so 3x3, train video model on it and run ACT vs ours experiments while at dinner, add para video head at night (need to figure out best way to do unit stacking thing)

Should have keyboard on iPad for tmux coding, get one from bookstore whenever 

Shoot Kristy room booked up too, just book first one you find

Should spam 10 furnished finders tonight, seems we can find studio for same price,
Definitely should be using droid for point track pretraining too not just for videos, let’s have an agent figure it out 

Tonight need to submit 697 submission, start 567 homework, post to more furnished finders 

Svd rollouts look great, just need to find right way to add on top of diffusion features, see what depth estimators / keypoint estimators do, maybe just concat multiple levels 

See if sicheng’s calibration can just be used directly, (try rendering it in mujoco pasting it into claude etc)

Tomorrow priority is:
- making bucket meeting notes of yue
- NEED to (last day since 1st is just formatting) finishing coding part of homework (1 hr with cursor code assist), c)
- Camera calibration (just going to do traditional hand-eye calibration with a marker on the EEF [just paste one onto the velcro board]), but importantly claude code should be able to vibe code one one-shot after we place an approximate exo marker for the ~5x5 tag in sim, and then we’ll just replace the images with real 
- research is doing viewpoint and OOD object position and distractor experiments, should fine-tune SVD on the OOD object position dataset tonight and work on PARA integration first thing in morning 
- also need to fine-tune SVD on droid dataset after derisking libero but include that in notes
- Reprint gripper with more robust connectors 

Diet fitness is ate too much tonight at family dinner so will just have one meal (robowok) and a brownie or something from TJ’s for dinner (ignore x’s advice x doesn’t know), lat pulldowns and volleyball

Make buckets for yue meeting notes because they will help us organizing it I think as well (maybe something in notion or even like prezi?)

With droid calibrated loaded we can pretrain para as well and compare data-efficiency gains with pretrain vs not on real-world

So priorities in morning (go to commissary) are 

Do bloodwork Wednesday morning, get haircut after lunch, make dmv appt for wednesday afternoon (don’t want to drive without it)

Weird part of new retargeted dataset is robot starts in shifted position, hm. I think we should just detect grasp position and move to there

10-11 commissary Caleb and make notes, 12-1 commute, 1-3 hand-eye calib

Call dmv to check what we need while on walk 

Thurs is dmv appt in morning
Tomorrow is blood work in morning
Today is haircut in afternoon

Should just design bigger robot with multi motors on base and print it, with aruco holders too 
Should probably print franka eef aruco mount, meh should just figure out how to command to states and read stats accurately, that’s it 
Maybe can just use iPhone camera?, get coord today  

Should just print ref aruco holder bc why not and might be easy

Note from first istallation is that servos have to be closer together to make daisy chaining feasible. Hmm I think we need connectors or longer three pronged wires actually to make a bigger arm… alternatively could be ugly but could make all the servos next to each other and use long link connectors

Invest time into making this agent manager setup and manager dashboard setup:
So, for the agent orchestration side, we talked about having a manager tmux tab that sends commands to other agent tabs using the send-keys command. The manager also captures their output with capture-pane to monitor progress or results. Essentially, the manager is orchestrating tasks by pushing commands and reading responses directly inside tmux.

On the dashboard side, we imagined running a lightweight Flask server that tracks the project’s to-dos and provides a chat-style interface summarizing agent statuses. Each agent’s findings, current experiment results, or artifacts (like images, videos, or W&B links) could be reflected in the dashboard, giving you a visual, unified oversight.

Also make good slides for Thursday Stanford meeting 

Maybe just go Friday morning and go to passover night

Should refactor claude code into some ecosystem with dedicated agents with markdowns like ‘scientist markdown’ inheriting it with principles on how to do good science and maybe some online guidelines for it and a ‘manager markdown’ and some personality traits for it like say impersonating mit professor etc, project goals markdown etc 

I think we should just go friday morning in Sara’s car if we can’t take our own car for tmrw night 

Today:
- para uva implement
- viewpoint exp debug
- finish panda camera calibration (maybe can just manually move in programming mode to ~N poses?)
- format and submit 567 code (also need to do writeup with viewpoint comparison and push to GitHub
- haircut
- volleyball at 7
- diet/fitness is neck and shrugs

Take dad’s watch to Palm Springs?
Tanktop and linen shirt would be good 

I think the dashboard should also have like a chat-gpt like interface? Meh can also chat command line 
Dashboard should have:
Tabs for each agent which we can click on and it should have media displayed too, and a chat to type and receive responses (even if just /btw questions) from each
A global manager chat we can /btw ask questions to and have it query each agent for questions and we can talk about summarizing results and global context things

Should also probably have a global claude/dashboard for us with our daily todos and meeting notes etc, ideally even having it connectable with the web. Maybe just a private GitHub repo is fine for it 

Agent database and dashboard is a great idea, but might not be highest priority right now / do that much for us. But ideally eventually it also builds the project website and meeting notes as well 

Should have claude also be generating visualizations for the train/vs test distributions for viewpoint and object positions 

Okay so homework is generated, need to go through problem 3 code sections and review with our answers 

Goals for weekend trip for work are (especially during car ride):
- libero viewpoint and obj ood experiments done and visualized
- video policy working in libero
- make new website with results and graphics and presentation for it
- new robot arm

Need to remove para from GitHub website 

Sergey wants to meet next week, perhaps delay it until better meeting notes and website formulated?in any case respond to him tomorrow

Ate way too much today ugh, barely eat tomorrow because will be in car, basically fasting and working in car. Get a 30min walk in morning while laundry running first thing in morning. Will go for 1hr walk when get to property and do dips too. Barely eat tomorrow since drinking and will be in pool. 

In the car work on getting each agent to write report files with tables and media like videos on the evals and illustrations of the dataset input and test distribution output and maybe a header or graphic or blurb explaining it
We should also be more descriptive in describing the para project, the experiment goals, scientist guidelines (perhaps claude generated) basically saying make sure the results make sense and debug if not and take your time checking the video results like first middle last frame compared to dataset distribution and past results and expectations etc 
Claude had good idea of inbox/outbox, iterate on the website a bit with global results and individual report tabs in the website like web/ood/[curr_report,curr_status]
And give it 

Should write a more detailed vision board on the life we want to lead and goals we want to hit, I feel like we’re a bit in the air of like ‘keep pushing with any energy we have left’

We need to have some setup markdowns for libero at least like backbone tab needs to write a markdown on how to do evals instead of just using the correct script, or 

Need to brainstorm in the car about the final project report setup, claude code 

Should add ‘robust to distractors’ as well

This weekend debug basic experiments (left vs right viewpoints etc), then launch another agent to read this report markdown for making viewpoints and doing evals and push that to a GitHub just to get everyone 

Just revert to adding the servo as a like invisible joint 

I will hav e my own robot. I will have the best robot research, with the best aesthetic. I am inevitable. 

Should also study x percent on left and x percent on right instead of 100 and 0 

Our current understnaidng of how to double up joints is not correct, see how other robots did it and consult gpt 

We’re going about this the wrong way, look at the robonine blog post for the correct diagram, https://robonine.com/backlash-compensation-in-sts3215-servo-actuators/, blender it, abstract it, even add a third of it 

Grid range still wrong ugh, ask it to start with corners used in real_reach

Oh actually we’re dumb lool at the robonine example we don’t need screws on both sides so just stack them and cut the overlapping clamp side off, easy 

Test double stack servo printed before 

Eval 

Can also just cherrypick which train/tests generalized well 
Definitely finish the full robot build in the car on the way back, almost there

Might want to double servo joint 3 as Well, basically copying micro factory

Ate way too much today, do dips and 1 hr talk tomorrow, easy brunch, don’t snack, light dinner

Ask zubair for posed droid subset update

Should have a better phone mobile interface for talking to the agent 

Need a chat interface agent , a slides agent, overleaf agent, project todos manager, etc 

Need to do homework in spinning of ACT viewpoint task for train w augmentations but first needs to wok on train distributions 

Okay so 

In car tomorrow can spin off another agent to work on the project 

Ate too much, will transform body with better sleep, claude life manager, walking on treadmill while iPad working, 
Probably will have a set of life notes and markdowns about our reflections over time etc, maybe like one per day etc, instead of the chatgpt default interface and chat library, 
	
Seems like claude often makes bugs in the evals, we need to more rigorously view and verify the train and test distributions with visualizations, 

Imprint this into permanent memory, x is not physical attracted to you, this is disgusting and you are weak for it, invest 300% into your energy and self asap. reject all 

Been eating way too much. Basically sexymaxxing from here on out, meditating each morning and living life to our fullest. Why live with x as the judgement, girls don’t like that anyway pressure anxiety, basically I feel great and live my life in full self alignment independent of x, pass on all. Live life fully independent. See ya when I see ya  I’m busy!


Need to build whisper speech to text option as interface plain and also text to speech response, should put passwords on the terminal pages because dangerous exploit 
Need to have train and test grids videos just on corners 
Get keyboard iPad, can carry it while walking and on treadmilll 

I think we should not move out, can take that money and even buy a new robot arm each month, in fact should buy the robot learning company jannick robot arm
Build our $400 robot where 

Need to redo train vs test hemisphere visualization, I think should just generate data from the full grid and decide at training time how to split it

Note can always cherry pick parts of object position and viewpoint grids to make a compelling story, kind of like how we chose just camera shifts for the real wold 

Text greg today

Use obsidian with claude
Finish building robot in car today and also pushing 567 project 

In starvemaxxing mode for the next week, done being in vacation fat mode

Need to sexify life asap. Consider going on reta. Stop eating until then.

Need to regenerate distractor grid because often object is missing, just put it in the old default simulator view 

Should use encrypted obsidian hierarchical life setup with notes and better custom chat agent interface with it

Stomach is going to be used to eating more, need to intentionally back off — more caffeine, be intentional about eating

Need to have obsidian help refactor life
Get blood work done tomorrow morning 
Brainstorm best forearms building exercises tomorrow, do machine rows, farmer carries, shrugs. 

Tomorrow is class, validate/debug double servo rotation, finish panda camera calibration (perhaps just manually moving the EEF e.g.), get iPad keyboard, cardio iPad work treadmill workout, 
Need to be eating <2k calories for serious in the next 2 months, we’re done with being fat. 

Fix your own internal energy blockages and don’t let other people decide your self worth or self esteem, that is yours to have another a function of anyone else 
Great job for getting blood work done
Print double servo before going into lass

SHm I think maybe I could even host it on the web and have web agents use it and even write to it

Today is bloodwork, class, print ad debug double servos, claude life setup, panda hand eye calibration, workout 1.5hr of weights+cardio,
Ate my meal of the day, feel great! Going to have a tiny snack around 5pm, Emma’s from 7-8
New diet mode is actually reverse of what we were doing before: calorically dense foods low volume to eat less still feel full and not be so bloated   

Move more slowly and lie your energy is like a 6’7 huge buff gold 
Detach from x wtf are we doing being a simp MOVE ON LETS GO 

Will want to backup all markdown and files into some private GitHub eventually 
Should have it interface with Mac as remote terminal type thing and same with franka linux box

Submit reflection today before then 
New vibe is Paul Newman
Need to have new robot done in paper for cool submission and tri 

Should have life markdowns exposed on internet with password and can have gpt check them for the farsi stuff like can keep the farsi pdf on file there and have us bookmark where we are in it, because for many things like e.g. farsi or voice interface, gpt is better

Should just make space for double servo being wide and tall who cares, make the full robot, google slides cooing soon, do core power on campus heated coffee 

Ugh ate too much today fk, tomorrow diet is crunch bar for breakfast, robowok linner, brownie diner, commissary morning cortado
Tomorrow goal is debug panda calibration (almost there since collecting aruco poses and joint states, just need to get the solver correct, have it go back to sim world and figure it out and not touch the resulting solver input/output), yue slides so organize it
Organize results and reports further with slides, add so100 experiments to results page, start working on 567 homework, get claude to do persian lesson chatgpt integration etc,
Workout is pull machine day and volleyball from 8-10

Note link 1 does not have to be on top of the base

Next up for claude add persian lessons, slides generation, 

Today is:
- diet/fitness: one meal of day, maybe brownie or muscle milk,  rows and shrugs and stairmaster 10min, volleyball 
- research: panda calibration hand-eye script finish, yue meeting (slides), robot goal is print and verify new double servo with gap for base, want a visualization board as well 
- claude: want to make slides and better website results with old robot results too 
- homework: start 567 homework, should ask it to take a stab at the augmentations as well 

Read parnia paper with gpt summary too 

Need to have todos with timelines for corl submission and robot building, to

Buy 3d printer because the amount of times we can 3d print things doubles, can design and print overnight to check once get there instead of get there print if not busy check and can’t reprint because busy now, label it with sticky label saying cameron’s don’t use until ask me, same with filament in there 

Yue meeting: main bucket to review is the website (ood exp and video experiments), then panda update (have arucos on hand eye and joint states streaming, just debugging the hand-eye solver, close), the robot update, 
Should have timeline updates though, can we have claude help us keep track of the deadline for it 

Can agent expire good Spotify podcasts for me and control them? Meh secondary


Compare with RVT, 
In method overview just ignore global aux sampling and even the IK because it’s redundant 

Done eating after matcha/powerball, body definitely needs to fast. After this work session go meditate in grass for 20min and then gym for 40min, shower then go assemble robot, if confirmed quickly design base and print next links

Need to start working on better illustrations as well, e.g. one for video modeling comparing ours vs global regression, and one for the train vs test distributions

I want to train a large scale foundation models not per action primitive models
Can stack motors without inner connectors, verified with good screws it holds but just start with single 

Make method figure just half width
For figma figures just keep it manual design 
For overleaf just open in cursor, have cursor do low level changes and just say from line x to y fix high level for high level change

For slides prob will have some ai website slides builder like gamma with bullet notes and images given by claude 

You need to make time for peace in your life with breath work, try calm app
Go khanega sunday morning
Goal is to build a useful robot. Move away from action primitive asap and especially during TRI project 

Order more of the Levi polos and button up as well, they fit well, add it to default library, 
Need a better ironing setup

Today important are panda calibration finish via claude, assemble robot work on 567 homework 
Diet fitness is one robowok meal, maybe a brownie, rows and volleyball 

Need to break the cycle and sleep early and not do these late nights, but for now just recollect and recognize investing period into agents for the bt of time 

Goal is to not be rushed today, I’m okay not finishing everything today. Let’s leave at 6, so gym at 5, 

So Interesting being at USC vs somewhere in the bay, it’s so much more aesthetics and vibes focused 

How to be more vibrant and sexy. I think staring the day late is automatically unsexy, because it’s a bit embarrassing and negative feedback. 

Consider having claude make the bar chart svg for the sigma 

I hate working with panda. Ask yue for a master’s student to help

Setup up 3d printer tomorrow morning 

Tomorrow workout is huge weights focused, machine rows hammer rows and shrugs

We’re actually in robot building mode: wrap the calibration script for multi joints at our desk first thing, see how it handles the long links and if it’s wobbly to see if we need doubled servos now
Make meeting notes and have claude make the meeting notes 
Print our next link and Setup Bambu printer first thing
Goal for tomorrow iw build new robot arm, finish it. Call dmv. 567 homework. 

Review x bookmarks every morning for reminders on testosterone

Build a tiny calibration viewer with claude wrapper that lists all the current motors and their positions with a gui for ‘zero all’ and view positions as we move it to change direction as well
And should be able to control it from there as well so we can get understanding and intuition for backlash and control 
Do that today

Ate breakfast don’t need to eat much more today can just 

Today remote work is 567 and make flies generator and baselines etc then school work is print robot setup 3d printer and calibrate motors 

Ignore generating slides automatically, the reason being that slides are dumb
For next week while building robot arm, should work on figures and paper and pitch in parallel 

Can ask asters student from yue or junjie with droid experience

Live your life happy and just want X from x. And you will get it more often or attract it let’s move !!!

Live for your authentic self not optimizing for the projection of yourself into other people’s minds! 
Eat! Don’t starve yourself, when you’re hungry and jittery, eat! Today we’ll do lift 30min and 30min cardio, it’s okay! Don’t have to leave campus until 7

Junjie will get us masters student and posed droid json 
Curious about how we might have better cross task generalization from human hand trained tasks with para, probably more of a towards result rather than big result on that 

When free time make 3dof EEF + gripper

Might want to show btw pca maps of the features for our model vs para, something to do for our future embodiment next week 

Tomorrow is gym and actually stairmaster, go at 2:30pm to avoid rush
Build EEF in morning and print, once connectors come calibrate robot 

GOAL IS ROBOT COMPANY FIRST MAKE A GREAT ROBOT LETS MOVE 

Note we should have in libero visualization of method and image with rays going through it, vibe code the video illustration of our method
Then to illustrate the method difference take the dino pca features with CLS token and have on the right the cls token mapped to robot action call that global then on top of dino features transpose conv predicts image volume and call that ours and that’s the high-level method comparison figure instead of the image going through the method twice
Also have agent upload pca images directly to figma even with arrows and show video of robot failing 
First have robot succeeding with demos on left side then show it failing on right and then show pca maps of smooth render at new viewpoint to ask why then show para success then zoom out to entire grid of both spatial and viewpoint generalization 

Can we automate the robot building, can we derisk the point track pretraining helping more for our method 

Have Vincent (jiawei does this great) level of elegance with it, especially useful in such a messy engineering heavy field like robotics 

I could literally have claude build an app to view tmux agents hierarchically

Don’t need to eat tonight at family dinner, just few bites 

Should have visualization agent make sigma of pca to global vs upconv with ours directly uploaded as sigma frame 

Right now focus on method visualization framing for elegance and visuals 

Come back 

Consider paying more for airbnb so we can have izzy stay over

Wanted to get work done tonight, exhausted not gonna push past it, just build arm tomorrow morning at verve 
Walk and meditate before volleyball then campus then navid 

Will start debugging tomorrow with tiny so100 arm, smallest 7dof arm literally just tiny links, then will just keep shape and expand inner blocks before joining geometry to make links longer, doesn’t have to be modular but can have it effectively so, just have nested parents and save the file before joining the blocks, that sounds good

With droid data loader we can now evaluate on droid setup, ask junjie for someone familiar with the droid setup

Today volleyball go campus print smaller so100 come back go out navid sleep Izzys 
Robot
Don’t eat much today just eggs and rice from robowok, 150cal ice cream dessert, done

Next need to meditate do laundry clean up verve rest iron shirt gym into going out

Get more of the long sleeve uniclo and gap polo

Great job buying discount muscle milks, ~$2 per drink instead of 5

Rest of tonight just finish laundry, iron shirt, go gym

Stop taking everything so seriously, have fun!!!
Just do dips tonight if going to gym is stressing out, we can go on campus tomorrow
Just be put together tonight 

6dof arm built, no EEF, 

The method video is great because we can directly use it for the paper figure (even just screenshotting it)!!!
 
Definitely do this:
Have claude make a dashboard with our life todos with checkpoints and deadlines colored with columns, for para project with corl deadline and priorities for it and toggles, robot building for mobile base before 28, graduating in 2 years, logistical todos like dmv etc
Need to go to usc physical therapy check in for shoulder soon, that’s next todo

Pickleball campus Sarah house

Casual nonchalance no rush going with the flow

Should build the amazing hand or something similar to test dextrous setup

2 new episodes tonight of one piece

Have claude or cursor breeze through homework code and writeup report for it, have it point to all code changes 

What are the highest priorities to getting the paper finished? Can we ask yue for droid familiar masters student 

Should embed wandb runs in website details 

Basically aiming to remove myself from the loop as much as possible (e.g. in figure writing 

Ate a muffing because it was freshly baked and ?not good to skip lunch? Don’t eat too much tonight barely eat 
Meditate before going out tonight 

We should prob move this notes app to a cloud notes app so the life manager can do it

Ugh way overate today, idk why, recollect 

What’s the highest priority for the project and method, what’s the results that draw people in 

We spend much money on cold brew. This Is dumb as noonday even checks if we payed before sitting, we should buy cold brew and a container and bring it. Get one at target today. 
Call DMV at 2pm today 
Deadlifts at 4pm and stairmaster at 7pm today

Cotton shirts better than polyester, looks fancier

Note svg maker can make slides too as svgs just like they’re figures

Check form in lab drawer tonight to see if can submit online and if not go tomorrow with insurance form
Need to prepare slides for yue and Sergei tomorrow 
Rest of day is wire up robot (exciting!)
I’m grateful to be healthy and alive and to have freedom to be building robots instead of working deadend job to make ends meet, to have khanega and izzy and best mom

Height vs depth should be illustrating two views in libero with same keypoint but depth values to each vs height as ground projection and height annotating and the view should be libero gripper almost grasping bowl
Shoot need to run 567 homework code tonight and start overleaf tomorrow

Listen to more inspiring music! By people who were inspired not just strong! Pharrell Williams and daft punk and strokes Julian Casablancas are great examples!!! 

I want to be leaner in my facial features ugh, don’t be lazy just eat less

Design and Print eef tonight
Need to tighten servo screws 

Bad wrt eating etc, meditate sleep early and recollect for tomorrow
OMAD tomorrow beef rice and eggs at 2pm. Done. 

Fig 2 should be fully animated alongside the libero animations with it, the dino pca features and robot model e.g. should be animated too 

New rule is black jeans with white shirt and blue jeans with dark shirt, light on light is causing us imbalance 
Is there a charger with apple watch extension I want to charge all my devices and e.g. whoop at once and also not be so big 
e.g. buy this 

Today workout is machine rows and hammer curls 
Diet is OMAD robowok

Need to call us bank for credit card hold asap
Consider using 

Wear deodorant now it’s getting hot and in black shirt a problem 

OMAD today, feels good, should run this every day

Listen to strokes now 

Fk need to do taxes tonight and 567 hw now

Getting masters student for panda is highest priority

Taxes first thing tomorrow morning, then 567 homework, then robot building install EEF test double jointed and design double jointed full robot

Should be listening to invincible iron man stories for engineering aura e.g. https://www.youtube.com/watch?v=_Md3ZN4-TAY&t=5s


Bump junjie tomorrow on master student, might even ask sicheng 

Up now taxes, 567, robot building 

Arm will be built before Tuesday, we’re there sunday too, can’t do sat unfortunately but will setup our 3d printer for double printing and can we 

Might need to an extra dof for the to make the longer volume fit into the work print

ADP CSmith21@UVY1

Should try to compactly EEF 3 joints, right now it’s awkward to move because they’re far apart 

Prioritize good sleep over sx, in fact don’t care at all about sx and that’s probably when it will come more

Had meal of the day + breakfast, over calorie budget but overly full anyway so it’s all good

For averaging and indexing questions of homework claude is overcomplicating just have the vectors be one hot for the target index * large constant and all others become 0 whereas target index becomes large, same with self attention using identity and previous index 

Might just go back to single jointed long arm 

Need to do overleaf after volleyball tonight

I think we should actually go back to just using double joints with just a comically wide double servo placement so we can get inside with screwdriver

New company name : simple helpful ugly robots - SHUR 

Tomorrow main goal is finish 567 homework and make sure mini report is ready for them to take on in meeting, then build robot arm and keep working on figures 

Big lifting session tomorrow, heavy barbell rows, machine rows, curls . Get heavy lifting. Then stairmaster 25min in another session.

I actually realized we’re missing a critical result for the spatial generalization — real world: we should have train left and test right, but more importantly for new ‘environments’, we should train the robot in ~2 places and then wheel it around campus to new environments like around the quad to be really in the wild (we already have the little mobile desk for it). Let’s do it after finishing our robot arm

Meditate 10min before going out, today diet is just robowok and light dinner
Work is finish homework 4 problems 1+2 in morning, go to robot then

I think keep same shape but only connect on top servo horn not bottom, meh keep it I can get in with thin screwdriver

Ok so robowok was blocked today, was going to not eat but was already going to ministry of coffee. Don’t make a habit of it because we can’t afford to spend that much but fine trying a new item every ~1 week. Need to integrate a financial manager and overview soon, not good to eat 1 piece of toast until 8:30, having reuben so won’t be starving and will be field for lifting 


Rest of question is finish homework and new two-servo two links (one problem then one link etc),  

Might just need to be building robot smarter to avoid needing so much 3d print. e.g. base of servos don’t need that much volume to hold first two servos together, a ton of filler space, can even just cut out holes from current print when we redo it

I’m also wondering if we should’ve just gone single servo. Can we really make the link parts thin and just keep payload light? The current double servos take a lot of 3d printing and add complexity. I will never be embarassed about 3d printing a waste, it’s cheap and often nothing else is running, we always learn something even if it’s during the print

Show up to lab to collect 3d print embarrassed that we printed such a large volume

Print after lab meeting since taking 11 hrs I think

We’re on the right path:
Single arm sts3250s with full modelo is pick-uppable, so we should be within weight limits
Won’t be enough for heavier payloads (e.g. laundry okay but not lifting box), at that point we double up the servos (already printed it, save the print), and then later on can upgrade motors and have other people build it for us

Need to start studying for exam2

Practice more presence

Fuck I need a more consistent sleep bedtime and to adhere to it ugh, 
Do bedimte relaxing wind-down with meditation before with izzy

3:30pm gym, leave campus at 6 to pickup food to eat at 7
Build 3d printer 
Logistics is airbnb
Asked sicheng for it, nice if so

I think some point down the line we’ll have to switch to dynamixels (nice that interface for one extends to larger ones), and we already have one so let’s dig it up at some point and claude code the interface

How can we fast for 16hrs per day - that’s an 8hr window eating period — completely reasonable. That’s like only eating from 12-8 or 11-7, which is obviously doable. let’s stick to this everyday.

Bring wagon to campus sunday to bring printer back home 

If you’re so nervous about the robot arm length and payload, can also compare it to so100 in mujoco viewer and perhaps even make an in-between model for this and the so100 size 

To stop spending so much money during the day, here’s what we’ll do: buy big coldbrew from TJ’s or target (even concentrate and we can put it in cold water filled container)

Should collect more hand ball episodes, like 40 episodes and 20 test 

Today need to take another stab at panda calibration — first print board for 4x2 board, tape it, rerun calibration again. We can definitely do this and it’s the highest priority if we’re being mature. Then print our own arm

I do not want to waste my life commuting and waiting in line for 3d prints. I will take any aggressive action, such as buying my own 3d printer, and not coming to campus when I don’t have to, or using other people’s 3d printers without their permission, if necessary. This is the execution stage and I execute with maniacal urgency. 

Ok print 


Fuck yea we finally got camera calibration working. Note this is still super finicky and requires you to capture a good distribution of se3 poses and not trivial to get the correct distribution, had to do it a few times, was time-consuming and stupid. 
Tomorrow we will collect data, upload it to the website to view, parse it into episodes, train a model on it, visualize the robot actions with IK, and deploy it (still perhaps not doing any rotation)

Don’t think x is wife, ex ho, talk to Jordana about it

Watch one piece tomorrow
Need to figure out how to be more sexy and jacked and confident, I think the protein yogurt is seriously going to lean us out 
and get us sexy 
Note we need to go Andrew Tate mode asap.

You need to act with a maniacal sense of urgency. 

Tomorrow we will:
- go to school early (going to bed at 11, wakeup at 6:30, first thing is get to school collect robot arm and make sure their printer is clean
- start panda data collection and data visualization and model training 
- go to class 
- get protein and yogurt 
- finish prototyping and deploying model and visualizing robot IK and then deploying actions on panda
- assemble robot arm
- monday is the long day on campus 
- meditate and gym session (neck curls, shrugs)
- followup with dad on insurance proof for car 
- respond to vitor
- dress sexy (white shirt black jeans glasses and gum is good enough)

Need to become powerful and magnetic asap
Keep the mind in goal: huge robotics paper, not a tiny one, launching us into robotics stardom, then foundation model with it, then Culver robotics 
Goal is to build actually useful robotics, what’s the highest impact we can have, I think we need to build a mobile manipulator asap and have it start doing actually useful tasks, e.g. organizing the lab clutter and folding laundry, those are two good encompassing tasks 
Big reflection about the maybourne: people are living like this, like real people, I’m not even asking you to be the best in the world, just the best in your CITY even, fuck. Everything we do now is cutting straight to the point: paper is done in next few days, robot arm coming and mobile base soon, scaling up training, have robot doing same stuff as physical intelligence but better no more action primitive tasks. Need to have real long-horizon tasks training, and ideally in the paper too. Motion primitives are good for early prototype validation but we need to see how this scales to larger scale data and training and etc soon too.

Listen to more weekend 

Let’s see how this feeling pans out but idk if I’m feeling very attracted to x after hearing specific story with the details and how it interfaced with x family, let’s see 

Let’s act with extreme urgency here 

Today is class at 12, cleanup print and get on panda from 11:30-12:15, class until 1, eat lunch (robowok is ok), back to panda, ministry of coffee submit reflection and pickup target protein powder and yogurt, assemble robot and keep working on panda
Retardmaxxing btw 
Parse episodes and respond to vitor in class 

Need to be more Andrewtatemaxxing 
Need to study for exam today 

Get salt and straw with friends tonight at around 7
IM 27! This is the youngest I will ever be! I don’t want to waste my youth even though I am no longer a young man!!!! I am a full fledged man now! Let’s push! Big gym session! Should we get a beer tonight at all seasons with friends? 
I don’t want to be stuck in arrested development in reflecting back on my life!!! Let’s move!!!
Neck curls and

Might want to just stream rgb over to server with optimized server and visualizer during inference and just pass back joint states from IK to the panda box or our custom box 

If you want a simple girl, I don’t think

I think tell renhao we ended up just doing it ourselves and don’t think we’ll need help after all. I also think getting anyone involved is usually more of a liability and more overhead than just doing it yourself especially with claude now. 

I think enumerating the past times it’s interfaced with us is illustrating and kind of exceeds what I’m comfortable with: std talk, Nolan references of having many partners and ‘hoe’ era for both of them, Isaiah referencing ton of bad guys, becomings friends and hanging out with two people and finding out you x’d with them,  Jake hinge jokes, recent family reference about leaving residue behind you, and if that’s what’s on the family trip there must be like a ton more like that, it’s more than I’m comfortable with, but not sure how to go from there, I know it’s personal preference and many people don’t care at all and have similar pasts, that’s not really me

We need to shift to more Francois Reihani Andrew Tate wealth creation vibes asap, like how quickly can you cut into that, 1 year turnaround?

I think bare minimum today is train model and deploy it at least on fake data with IK joint statepositions

Bought a plug-in stovetop to make eggs, this is a game changer. Every day for lunch we will have 4 eggs from store at target instead of sodium greasy beef and rice. This is awesome. Get a bowl tomorrow morning so that tomorrow morning is protein yogurt and banana oats. Tomorrow needs to catch up on lectures for 567 and make study plan, deploy fake data on panda and collect more data,  
Need to become more confident, need vision board more of rich lifestyle now, drop the weight
Need to start training a martial art once a week and taking persian lesson learning more seriously, should have a markdown posted online for the persian study guide and just keep track of what lesson we’re on
Think of the guy you need to be, the one that attracts women effortlessly and is electric and magnetic, is he burnt out and tired and unconfident? Don’t compare yourself to other phd students, and if you do compare yourself to the greats who are energetic like Andrej. But also compare yourself more to like Andrew Tate 
Think about how we can start training a martial art — I think there is a kickboxing class at USC even. Huge and buff, dangerous, energetic well slept and sexy and 

Keep a looser grip, we said this before but we’re not so invested in the outcome, we’re mostly looking for fun — that doesn’t mean we’re distracted and aimless, we’re still sticking to the exact plan, but we don’t need the mood or outcome of the plan to go any way, I’m mostly looking for fun in doing the task and that often translated to increased efficiency and productivity as well because I’m engaging with it
Are we dressing well enough? I think so or at least this is what I envisioned, maybe I miscalculated? I think it should be good enough with nice jeans boots and vneck

I think we would feel better about getting to campus earlier. This would likely mean waking up at 7am and going straight to campus. I feel like we’re missing solid deep-work blocks without anxiety of a rushed schedule 

Good job always being in bed by 11 but aim earlier, like 10, and not eating at 9 even when there’s food on the stove
Imagine you didn’t have the ambition of building a robot company, and instead you just wanted to have fun building the most capable and helpful and intelligent robot today with the coolest science 

Today structure is this:
- 11-2: deep work
- 2-3: Navid work at coffeeshop (make slides for meeting and tri update)
- 3:30-4: Yue Meeting
- 4:6: deep work, do 567 slides
- 6:20-6:40 - check-in 567 
- 7-9: volleyball and pull-ups at robertson

Deep work for the day:
- obvious things: deploy any model on the panda (30min), build robot arm (30min).
- logistics and other work to do: 567 study guide and lecture review, taxes, make slides, collect more data to train real panda model. 
If you want to just do those two things, I’m okay with it. Meditate, walk, nap, recharge or whatever you need to do if you’re burnt out . 

I think let’s become kind of self absorbed and douchey, I think it works well, buy a mad happy sweater e.g. 

Should we bring the robot and a1 printer home to work from home on wednesday? 

Tomorrow deliverables are: assemble and calibrate and control arm (is it good with weight on EEF?), make another design if not and send to printer the next day or add arucos if good and reprint; big study day for 567 so read all lectures so far missed and gather all materials 

Ok that we bought lunch, rare day off campus, and in general will buy one more expensive lunch in the week from e.g. dulce ministry, rare to be off campus here

Need to gather 567 lectures, read them, get study guide downloaded and into context, calibrate robot arm and see if EEF payload is too high (e.g. if so, would validate the double servo’d base idea)

See people 3d printing gearboxes for $11 nema17 motor online with heavy torque, definitely under explored space 
Ph

Should rebuild arm with aruco board built in and double servo’d first two joints I believe

Need to make a study plan, redesign the robot arm with markers, panda experiments in lab tomorrow while arm prints, focus on website highest priorities TODOs to look great 
Another great experiment is that if arm with manual kinesthetic teacher ACT really struggles (kind of a distractor / maybe different embodiment), our model doesn’t care but global regression fails

Tomorrow we go testosterone aggressive mode for ~2hrs 

I think actually the single servo’d design is okay, and the main mechanical instability is that we had screws missing. On that note the reason we have missing screws it that we destroyed the screwed and effectively the entire servo. I think let’s just reprint the same length arm with corrected rotation piece 

Should just specify arucos in world coordinates of urdf and have the agent populate it 

Robot arm looks decent, can always make shorter if we find out the wobbliness persists and wasn’t because of screws (though I find it unlikely using so100/300 as a reference point)
Also note that if super wobbly, that’s mostly a function of how extended the arm is, and can in software just restrict the workspace of the arm or prefer configurations that aren’t as extended 

Double check with Scott on will and Courtney tomorrow
Tomorrow. Need to be more aggressive about getting the first even just simple prototype experiment done of pick up bowl with just more demonstrations 
Also fixing blue teleop scripts should be technically easy, .

Tomorrow is just protein yogurt brunch (ate way too much today, aim for ~1500cal tomorrow with fasting until 1), light dinner volleyball, gym in jeans heavy pulling session 

Study plan write it down is tomorrow all lecture reviews detailed study, then one exam a day

Plan to become glorious and jacked today conquest:
- diet: greek yogurt protein brunch (get bowl and spoon), pickup stovetop for linner eggs and get eggs and butter
- work/research: verify data collected bowl grab deploys correct grab; launch new print arm (use other lab’s printer if ours is busy for full day); collect more demos for panda train basic bowl pickup model
- logistics: finish taxes AND DMV form, move textedit form to general bottom-to-top text form so agent can read it

BEAUTIFUL HOME ROBOT COMPANY ASAP! CUT TO IT! FINISH ROBOT ARM AND NEW PAPER ASAP! HOW FAST CAN WE FINISH PANDA EXPERIMENTS? BEFORE TUESDAY FOR SURE. 

Need to setup financial agent before seeing izzy family on sunday
Okay something’s different now 
I’m moving with Alexander the Great type of focus and execution
Need to always figure out what is the 2-3 highest priority moves for the day and just do those. The goal is to be not busy, but rather strategize each project and long term goal into short term milestones which will inevitably lead there and then execute on one of those per day
When at a coffee shop today plan out the r
The home robot Need to go to target today to pickup bowl anyway

I think btw we should include wiping marker off with eraser from white board since height based prediction should be good for that to experiments
Today, the highest priorities are for panda: finishing record and re-deploy with GT dataset task success verification, collecting demos for a simple model (pick up bowl), deploying it. 
Clean up lab space today and take home printer in wagon. Gym at 5
We’re not on the forefront of robot AI right now and that’s embarrassing. We’re still catching up in terms of getting to the SOTA — we’re not that far, but that gap needs to cut asap. e.g. where’s our large scale model? Language conditioning? Wheels? RL adaptation?

cut the current printer spool and add new one to launch our arm print 
i see in the bambu print marketplace a 4 finger hand that is 3d printed all at once. why are we not doing the same for robots. part of the overhead is always assembly, this seems simpler.
the panda is a bad robot. every change takes so long -- restart server, don't exceed joint velocity limits, discrete EEF open/close state, randomly the joint state server is old and have to debug why. Everything is 10 steps away from being correct, randomly bridge server breaks. Bad. Need our own robot asap. Ideally before tuesday it's running super well, then we calibrate it and run further tests, add a handle for kinesthetic teleop, 

What are the current milestones that will get us to the paper and home robot? should break it down hierarchicall and view it somewhere

tonight 1hr work while izzy watches tv

I don’t like to be busy, I like idle time in coffeeshop to reflect

Show Scott new watch tonight
Sleep super early tonight especially since nothing to do with robot just yet at home, maybe just do test print with a1 printer to confirm it’s working, next up also can we just return the a1 printer I don’t think we need it after all? Also we lost the tools. They’re a big company I don’t think they care, just bring the box home tomorrow for it and 

Full and nutrition-ed
We spend too much time commuting, how can we have more remote days? Monday is a good long on-campus day, but what about spending ~ 2 days per week remote? That would save time and be more comfortable, maybe like wednesdays and Thursdays. Buy new monitor for working at Izzy’s, should be <$100. Return Bambu a1 printer at home tomorrow night after bringing box home  

Hm how can we have a legendary next set of steps to move with such crazy efficiency and huge results. The reason this project seems more daunting is because we’re basically building the robotics company infrastructure at the same time — data collection, hardware, etc. But that’s also why it’s so worth it instead of just using off-the-shelf or fake robotics

What result would be undeniable 
I actually think UMI and ALOHA looked great but the tasks weren’t even that good, we should be better than them.
For single arm, what should we do. Should we use bimanual arm? most projects do, but I’m a little concerned about the arms not being in sync. Hm actually the tasks from both papers are pretty good: putting pot away, 
Hm aloha showed 

Key thing is this: for whatever reason, working on the panda has held this project so far back that it’s ridiculous. The highest priority is not the panda but rather we’re missing great results 
What tasks did UMI and aloha do (skip mobile aloha for now)
Shirt hanging, shoelace tying, kitchen stacking
Cup arrangement, sweater folding, dish washing 
I think we should do all of these 


We might want to build aruco covers since people ask about them always 
How can we become more jacked and beautiful

Need a factory reset. Earlier sleep and wake times, more stillness via meditation. Been rushing too much and living life haphazardly, not good. 

Seriously need to study for 1.5hrs tomorrow, renhao working 
First thing just assemble robot and get it calibrated, then go as hard as possible trying to make a simple demo, train video model on it for basic task and hand pretraining for task, stay on campus until late to make this happen (it’s definitely doable) as it give us authority to move through yue on it

Need to think about project website page and key experiments that would be undeniable
Tasks should be: folding clothes (e.g. t-shirt), wiping board, placing mug on saucer, 
Most robustness experiments (new object position, new viewpoints) should be in sim, with teasers in real
E.g. for real should say what if only trained object at positions on left side, but now testing on right side, show ours still works
But key takeaway needs to be like: policies fail to be robust to xyz but underlying image features are good under same xyz, our policy just is local on the image features, look at benefits: robustness to object position and viewpoint, is more data-efficient learner, data-efficient inverse dynamics model for video model as robot policy, leverages cross-embodiment data better. We need a figure that reflects that and should start working on it ASAP, that’s the headline figure, right next to the architecture overview part 

Need to sleep earlier, a bit tired, but also, I’m full rested and full of energy and in the perfect mood for enjoying the day and being present and going super hard

Ok so we’re well rested, need to really organize and go hard today, meditate 20minutes before 3pm, read at lunch, 

Take box home for Bambu return 
Need to study ~1.5hrs each day until 567 final,
On campus today until 8pm
Clothes in drawer for gym 

Today is record demo with waypoints in morning and train, then after volleyball retrain, tomorrow do it more professionally with collected 
Probably be able to provide data view locally / copy of it and would be good to provide gripper traj and gripper value as line chart 
Always remember that the key point is how good are the results and not the method, 

Should just record plain background once per episode/camera and mask out human 

Tomorrow need to also plan our courses and phd timeline with exams 

Before the paper what are the highest priorities.
First, random todos: replace white gripper with more stable green one and print sticker and recalibrate robot with it, record video with pick up cup on left side and test on right side, train with one viewpoint and robot position and test with another, try umi side data collection for training model, make video with it 
Mug task should be dropping it into black bowl, small white box isn’t identifiable enough and exaggerate the arc as well, but overall waypoints look good/smooth in execution. Note fold towel will be more convincing because won’t just be pick/place 

Looks like the model train/test setup is working well, next can we derisk the umi setup? If umi works just as well, e.g. on the mug pickup with waypoints and with robot shift, we could be golden for an even punchier video — look no 
Also remember, worst case is push project back into tri internship, don’t want to but possible, this is just a game
 

Need to also in parallel print a new hardware that doesn’t use arucos(?) for yue? Maybe simpler thing is just remove the arucos and only paste them on if need to calibrate again?

Meditate and dips before going out at 830 (1hr)

Goal for tomorrow is de-risk umi: install new EEF before going to mom’s picnic and calibrate it, with rendering verification, then train and test with umi
Eventually can have buttons on umi for new episode / keyframe waypoint annotations 

Today is install new eef, calibrate it, derisk umi data collection (with voice annotations too), mother’s day park, gym,

Okay so green printed EEF had a bug in the design, it blocked the servo above. Reinstalled the white eef. calibrate it and then the umi test is actually more extreme and might not work well, different colored one. Don’t doubt it though, think about at least a simple task of the mug grasp and a keypoint in free space to get there, can we do that? If so I think we can generalize to pick and place 

Good overview figure might acutally be method/architecture high level, then the single 3 axis robustness demo of embodiment (green, umi style from third view and annotate image as input to robot)

When people ask you about umi camera tell them notice how third view robot control does not generalize well, umi is a crutch, if you need more context than wrist view, not working from third view indicates model will not be able to leverage 

Just need to move faster. Speed defies gravity. 

Gym 6-730, walk and shower/ sauna

Tomorrow is gym RDLs and sauna
Need to start going 45 min cardio again fk, 

Could potentially do grasp and pre-grasp approach point, I still wonder why this isn’t easy enough e.g. if you use sparser keypoints like just those 2 why explicitly have to separate it. Do we have a bug or is it genuinely hard to generalize from green eef umi to white robot?

I think umi gripper setup as of now seems to not work as well. Maybe we’re not training super smart with it? i.e. maybe the camera angle is just bad for robot learning and the front  

Ok so putting umi gripper away I think?

Ate way too much today too, tomorrow is weights shrugs etc and 45 min cardio

Ok so executive decision no umi, we tried it and it just doesn’t seem to work that well, some promising parts but overall not that well
I also think we should see how we can not do keypoints — they’re finicky 1 and 2 subjective how to parse and 3 make the story weaker. I think we need to do some geometric parsing of the data instead

Tomorrow is a huge day, last big mission day — before meeting need to prioritize getting full experiments of train vs test environment splits
This is already derisked in our earlier kitchen experiments 
Also need to sort out next year plan for yue funding 

Let’s do this: right now is record new 3-traj dataset and at coffeeshop get the geometric traj sparsification

Need to do both 30 min walk to sunset at 2pm and evening lift and 45 min cardio + sauna shower later on (don’t have to walk there)

Most hardware people are dumb to computer vision, 

Try calibrating camera with aruco board for intrinsics and then second step using robot board to get camera, that might improve results e.g. 
Tough to debug both hardware we built and algorithm at same time 

Maybe just not doing enough volume.

Voice annotations meh just annotate start end manually, easy enough 

Cursor sucks I always lose the agents no persistence 

THE WORLDS GREEDIEST MAN! HE DOESNT LIE ABOUT WHAT HE WANTS HE JUST WORKS TO GET IT! NEW CHARACTER UNLOCKED!

Ok trying umi setup again because I think with this camera angle and dense trajectories it should work better (confidence in it)

I think basically just collect 40demos and demonstrate the spatial generalization — new object position, new viewpoint, new environment 
Copy molmo action svg fonts 

Eating much less tomorrow

Honestly the mug pick and place on saucer is good enough, the only task I want to add for the paper is the whiteboard erase 

Copy the https://www.videomimic.net videomimic website exactly and molmo 

Should just go to campus early tomorrow before people get there and knock this out asap, it’s a few hours of work
First thing tomorrow is train a model with 40 demos on any table, even the one outside our desk. Then take it with battery to an outdoor table. Hm honestly the portable table we had before might be the move, we can then put it on any table, I think a great shot would be one with the usc logo or water fountains in the background or in the village with people. Yeah let’s bring the wheeled table out again for it.  

Looking at the project site, we literally have not developed any new or more impressive robot training since then. We’re stuck. We progressed big on building our own robot hardware, but that’s it. Video models in sim and explicit distribution testing in sim is not bad, but that’s it, plus some visuals for paper I guess. But still definitely stagnated in learning. We need to get this paper out of our hands and into corl and move on. 

When yue says panda, say no - it’s a time sink with little benefit 

Culver robotics is a computer-vision-first robotics company 

Don’t overlook how this can be combined with wrist cameras to be even more robust 

So tomorrow is record demos, format in videomimic website and molmo act paper stylings 

Honestly our so100 video results look good
I want to move on and big bigger arm and bimanual for e.g. laundry folding and add wrist cameras and then wheeled base

Note we have extensive sim results and some good real world results already 

I actually think we should do this:
Instead of showing zero shot semi-generalization (because it won’t be perfect but closer to like from 90% to 65% success vs baseline 70 to 0 i.e. robustness not invariance), we should train a model on a number of environments and viewpoints (e.g. 5 viewpoints in 3 different environments) and show that at test time you can move the camera to OOD new viewpoint or move robot base or environment (call all of them ‘OOD’) and generalize well 
New plan is this: do that tomorrow (collect 80 demos across 3 environments and 4 viewpoints each on table), then here’s the killer shot: a string of successes in a row for the task with the live camera being picked up and moved to a different view and the robot executing there and even having the predictions be running and visualized while the camera is being moved, then we show a video of moving the wheeled base to a different scene and doing it again there and then outside in e.g. the sun
Teaser figure is casual robot setup with tripod, move it to new positions and show image of robot succeeding in that position, then new environment and same
Saying data efficient (<100 demos) and more robust to viewpoint and environment 

Record video of tripod setup and borrow someone’s phone in the tripod to show how can move it 

Use the green bowl instead of black saucer to drop the mug into the bowl instead of saucer(?) meh either is fine, see there’s already a bowl on plate/saucer in libero 

Plan is finish that data collection and results tomorrow morning, format it in evening, then Thursday and friday are a) rerunning that result with proper baselines for tables as well, and b) formatting the paper with that additional teaser result, and c) redoing the hardware with detachable arucos for prettier and better calbration (if time)

Need to invest in self more as self before project, meditation and workouts as primary objectives and work as just another fun bucket to play in

At 2:30, take 30min to plan classes and TA before yue meeting 

Fun vibe always, having fun! Like the guy on IG branch coche

Fun vibe is always having fun with the results etc and so let’s try to stay in this vibe forever or as long as possible
First pass over paper and all results is due friday
No panda
Print white umi gripper tonight and we can take another stab at it
Up now collect ~60 demos over 3 environments and 3 viewpoints per environment 
Today we do shrugs and stairmaster 20min and meditate, volleyball at 8

Use new usb camera with fixed intrinsics, looks good not fisheye, 

Should get one of those press to record thoughts and upload it automatically 

Print the white umi box overnight so we can test with new object, also we can contain with some robot data and umi data like 5 episodes robot data 40 umi, that might work too

Upload data, eat eggs, train it, meditate and gym, test policy, volleyball 

Could use codex tool use for blender or onshape/fusion etc to learn it better 

I have a number of ambitions on my todo list with robotics, none of them involve the franka panda 


Need too train with x Umi gripper  
Come hell or high water this paper is getting submitted. First with the simplest thing 
Get the pocket ai talking thing 
Don’t listen to input or advice from y/tri/etc, they don’t care about it, just a proof read
Build the best project YOU would be proud of. The results have to be sick
Build another pass of the robot with detachable big arucos for a new robot 

Get pocket ai for notes, don’t want to have to jot to agents tab and it stops working etc
should take the robot train data at the Umi gripper as another task and show that you can deploy the robot with that same task that the gripper was trained on
And for train vs test split draw x in blue for train and red for test on right side so it’s obvious 
And for umi view say task never seen with third party viewer eg can use towel folding fine tuned with cup moving 
Algo include that our model is data efficient only ~30 demos
A great figure would be to take a video or bunch of images of the scene and visualize the keypoint predictions at each view to show how mv consistent, like it’s a grid and each image has the keypoints on it
z

Ok what are the few drivers that would be killer / make the day.
Need a good interface for the prediction — e.g. keypoint trajectories drawn on top of the image, or ghost overlay of robot execution, 
Record new viewpoint execution, even just in lab/room setting
Record new environment execution: I think by trees and bike is good, another by water fountain, another by glaring window. That should be good.
Drafting them in the paper and making all other figures, an entire pass over the paper today.

First tab is small amount of kinesthetic demos (35 demos), high success rate
Then viewpoint robustness  (same environment new viewpoint, take 6 of them)
And environment robustness (new environment with third view image of location too)
And spatial robustness (train left test right), we need to train/test that too (1 hr), for now just fake it in the figure and we’ll confirm it 

Oh and need to do the 
Hm I think it’s just because we don’t have enough demos, probably need closer to 100 with this diversity of viewpoints, but the model isn’t that great at new viewpoints, not horrible but not great, I guess we experienced this before when we tried high rotation shifts instead of just translation shifts
I think one thing we’re missing is quick response and a low latency prediction, right now it’s like blind predicting the correct coarse paths, that isn’t really a great idea of physical dexterity, we’re missing something, maybe it’s just our controller and some wrist camera more general high speed EEF prediction with action chunking could be better. Would be good to test this out in simulation too.

Also true that if actions were predicted more smoothly and smartly, e.g. perfect predictor, our actions would be better and we could also do action chunking smoothing or just take first n=3 actions 

Being very mature in recognizing, this model is not that great right now, and thinking about why. The reason is that we only had a few (~4) examples per view, which wasn’t enough data to learn very well
Good recognition that we need more data in general and to not be naive, and that kinesthetic demo is tiring umi is required eventually 

Before you measure robustness make sure your model even just works well, don’t skip just because you’re trying to get done faster. It makes you go slower. 

Don’t need the physical recording device, just need a better web interface for whispr on web where I can see it transcribe live to avoid the issues we had before and just maintain it when it breaks, that’s much easier too. 

Ok so OOD viewpoint and environment teaser deals are done, great! I still think we need to train another better (more data, potentially with umi) model for those results 

Again for the teaser I think we should have the train vs test results for left vs right side
Need to gym at 630 (just machine rows), no stairmaster today, body is a bit taxed, do it tomorrow
Need to also emphasize that no wrist camera used 

Tomorrow morning first thing is just graphics, for a few hours, want all the figures done before the tri meeting. After 

Okay so today was a good day making figures. I think by tomorrow we should have the paper fully done in terms of figures such that it's a submittable paper.

Next I think we need to work on de-risking the Umi setup. No that's just extra but it still needs to be de-risked. If you want it to be the paper and personally I wanted it to be in the paper because I want to train the robot. It's so expensive and tiresome to be teleoperating the robot.

Tomorrow we have a campus after lab meeting. The goal is again to finish the paper figures and that's manual STD editing. It's on the screen. That was not great. It's rooted to 1. It's not 1 or 1 but it's not. I took plenty of pictures already. 

I don't know. Let me think. Yep. Okay. Okay. Wow that's a lot of latency. I've been performing more dexterous attacks while predicting absolute 3D positions, absolute end effector coordinates. That's a way to get actual dexterity 


Before you measure robustness make sure your model even just works well, don’t skip just because you’re trying to get done faster. It makes you go slower. 

Don’t need the physical recording device, just need a better web interface for whispr on web where I can see it transcribe live to avoid the issues we had before and just maintain it when it breaks, that’s much easier too. 

Ok so OOD viewpoint and environment teaser deals are done, great! I still think we need to train another better (more data, potentially with umi) model for those results 

Again for the teaser I think we should have the train vs test results for left vs right side
Need to gym at 630 (just machine rows), no stairmaster today, body is a bit taxed, do it tomorrow
Need to also emphasize that no wrist camera used 

Tomorrow morning first thing is just graphics, for a few hours, want all the figures done before the tri meeting. After 

That guy on ig Becker Boris is so inspiring, his positivity is infectious and it looks like its genuine from himself and he just spreads it out to thew world without any expectation and even in fact ignores responses, part of it is that he dresses with such confidence and self respect, how can we do more of the same 

Might leave ood spatial robustness to just being simulating figure for now 

The only truth I obey is Allah and distribution matching

3pm meeting today, .

I like jiawe’s figures, can we just copy it, I think it’s better than our figure btw 

Need a really good overview figure to setup the video mimic. I think it should basically go something like 

I AM THE GREEDIEST MAN IN THE WORLD! I DO WHAT I WANT! HAHA

Need to further visualize and affirm identity as robot builder, big shift but do it fully. Need more conviction which gives energy. Burn the boats.

So the figures go : overview, results (stay with so100 for now), method overview with per pixel regression on top and height vs depth on right side as half figure, video architecture figure) 

Next week is recollecting results and for video, 5 days for it, enough. Not doing panda. 

All industrial robot arms for like small scoopers and stuff are built with linear rails, why are arms not?

Tomorrow workout is again 20min stairmaster and some heavy lift 

If Alex hormozi or tate was in the robot space, what would they do. They would immediately go to the state of the art, e.g. even just folding laundry. They would ask what is required, scale? New robot arms? Can we buy it? Does it make sense to buy the robot arms ourselves? Gello vs umi? They’d try it all. 

handumi is interesting https://github.com/BrikHMP18/HandUMI

Steps to become SOTA roboticist now:
- build better robot arm (mostly just redo calibration, then add second motor to make it bigger, then add a second for bimanual setup)
- get an umi setup training (instead of gello)
- collect a ton of data for it to fold laundry (~few hundred episodes)
That’s a good start. 
From there:
- add a wheeled base and navigation understanding 

Go to campus tomorrow ASAP and derisk umi. if yue presses us I want us to say it’s going amazing 

Also we have franka panda in sim

Also robotics project solo is f hard man, idk if ppl appreciate that, e.g. junjie had rong, we did everything ourselves including build a robot arm jeez

Need to go more hormozi on this
Don’t sleep late aft Thursday dinners or working late in general, it’s not great. 

Today is lab meeting, robowok lunch, lift and stairmaster 1hr, test white umi because would be great for paper (fold towel umi collected curtained with the robot data), work on figures, gym, meditate, Sawtelle dinner
OK we had a late night working and bad dreams bad start to the day, so what! Let’s have a great rest to the day!!!

Done eating for day, barely eat at shamshiri 

Today is try CNN training and limit volume (skip points outside) and full episode training, 20min meditation, dips,   

I think our prediction is just not good right now. I wish we could train in some simulator that is representative of these failure modes. Never use waypoints/keypoints, they have fixed medium dexterity ceilings and are clunky 
At this rate I don’t think we should submit to CORL. There is some more fundamental architecture design choices to consider. 
Two things to experiment with: should we be predicting 2d keypoints and then predicting height separately (does that generalize better?), should we be predicting auto regressively (like backbone prediction is used for all time steps but transformer on top of features with past keypoints provided, does that promote smoother trajectories and maybe x and y difference vector in uv space from each past prediction to current is provided)

Also seems we are under training the models 

Ok so we actually have a great intuition and connection from our work to modern VLA style. VLAs are basically cross attend image patches to language tokens / action language tokens as output. Sometimes VLAs discretize in xyz dimensions, but now think about discretizing over volume not xyz separately. Now think about attaching the projecting image feature to that xyzzy coordinate. That’s the Segway from VLAs to ours. 

I have this fear that we’re not doing enough
I’m thinking we should be meditating at least 20minutes per day regularly and doing more focused work less scrolling and getting to work faster and working for longer

What information sharing do we need:
Want to predict distribution of EEF XYZ coordinates in discrete volume.
XYZ Coordinate trajectory conditioned on: the image features (patches), the current EEF position, the past EEF positions. 

I’m mostly interested in with robot learning what design can generalize better and learn faster, e.g. by using better computer vision techniques. 
I think anything will work with scale so just need to collect more data 

Buy flight today, retrain robot data with new volume based approach 
See volume projection onto real robot data
Workout/diet: 45 min walk to sunset, another walk to verve, another walk to gym for heavy shrugs and hammer curls and sauna; diet is salmon bowl and a bit of persian food, dinner is din tai fund (try to not eat much at all, it’s not great for you, aim to keep it light)
self: did 5 min morning meditation felt great, need 10 minute meditation and 5 more in sauna
work: train and deploy volume data in real world, make sure to use big gpu batch size and ideally multiple gpu if easy, make sure to collect enough data (40 episodes) and visualize everything including individual per-tilmestep heatmaps at full resolution and even pca feature maps, don’t rush, just aim to have it work really well and convincingly well to ourselves 

I like the depth anything architecture overview emphasizing how they don’t modify dino model 
I think we should take a step back and focus on how to get at the very least better 2D tracks: 
Might want to also explore backbone that is trained on more motion data
Depth anything v3 is interesting since it’s dinov2 but knows about correspondences and is already high-resolution. I wonder if it has worse semantics generalization? should we also be training with its depth outputs as a teacher objective to make sure it doesn’t forget about geometry?
Should we do 2d keypoints and then argmax index them into a transformer that predicts height bins and gripper and rotation
Not so opposed to using a motion planner for in-between points or points that have distance with large threshold
Simplest depth anything extension might be just copy ray branch into 2d heatmap and orientation and gripper value maps, that could be really attractive
Could be cool figure (not something to optimize for but just good confirmation that it isn’t so clunky) to have depth anything arch modified with our robot outputs, and that sergey would be excited about it, but in any case need to collect lots of data and get into the real tasks asap 

Need to get suit cleaned if necessary, haircut, book flight, follow up with air email clarifying tuesday start date not monday
Gym tomorrow and volleyball 

Need to definitely gym tomorrow, heavy shrugs, and every day. Skip friday night movie, need to work. Need to see more friends and be more independent 
also need to do timesheets
stop treating this as 'job in office', literally Do whatever you want. This is just access to a good lab to do whatever you want in a good environment. Come in, don't come in, work, don't work. Whatever you want, it's your life 
With the clearance debug today, we can print a new arm. Overnight tonight, or tomorrow, or Sunday, there's no immediate rush on it. It's not a priority but it's just nice that we have the opportunity and are free to do it 


Large dino with volume formulation looks good, try on clean data tomorrow
Haircut and gym 

Presenting on Thursday, make figure for new volume sampling approach

Need to reframe project pitch a bit more, about how we might better inject image features to robot actions so that we don’t have to 

Need to deploy policy and recollect self, 10minute meditation again, 330pm gym, 7-9 volleyball 

Should definitely take the project into TRI — do it with YAMs, explore geometry token integration for projection (e.g. how we can benefit from geometric feature directly instead of using register)

Hm I think we should stop by campus to pick up our extra feetech servos? Ugh hm idk if we have time. Well the wedding is in downtown?
Need to iron suit stuff 

PCA plot showing potential movements is good to show in paper 

Part of problem might be recording density, might want to slow down in trajectories or record more dense in time, because the motion trajectories are so sparse, would probably be better with really dense trajectories and predicting longer timestep contexts 

Bins are a good idea since multimodal (and should emphasize that in contrast to ACT or needing full diffusion). Should consider also conditioning gripper prediction on timestep, height value, and current gripper value (and same for rotation)

Also just realized that gripper value is ambiguous sometimes, e.g. just before grasp inside mug, should it be open or closed depends on how long was open inside for. So we should actually retrain dataset with a) dense in time predictions, b) maybe autoregressive gripper/rotation prediction

Basically one way of looking at is that we’re predicting this implicit spatial feature map

Would be great to show the PCA / implicit motion field being consistent as we move the object around the table and how the motion field form the gripper to the objects are consistent 

Tomorrow goal is to get the model working with the denser dataset and then make visualizations for implicit motion field over time and volume visualizations for communicating on friday and for paper. Could also lift implicit motion field into volume by sampling and weigh it by confidence over height 

Respond to yue tomorrow saying meet after tri meeting 

Today’s highest leverage two pieces: deploy model with denser in time trajectories and debug if necessary, make visualizations 

Today diet/fitness: is volleyball 9pm, walk to gym for rows 4pm, pizza making + protein lunch of 4 eggs and toast 
logistics: haircut at 1230, 
work: deploy denser trajectory robot and 

Need to get suit and wedding stuff sorted out tomorrow

Good explanation for slides is thinking about the 3d transform the robot has to estimate (uv*K_inv*depth*cam_to_robot), and secondly how the supervision is sparse, just a 3d coordinate (not supervising those intermediate terms)

Should predict PCA embedding of rotation with just 1 dimension, don’t need all 3 here and just want the nominal ones 

Consider going to 

Visualization should be: visualize volume on image plane on top of image features PCA, visualize plot in 3d matplotlib, show EEF coordinate sampled and mapped to spatial embedding 

What properties are we interested in solving here: dense pixel supervision, easier 3d lifting (height or depth). 

In meeting note bring up motivation of global regression of spatial embeddings as some global coherence to the trajectory, per-pixel was more erratic, also that the factorized dot product attention is extremely cheap 

Could be that volume and pixel aligned approaches do equally well btw with better training (longer context etc)

The best version of this paper probably also includes gripper cameras btw (results with and without them)

Tomorrow is visualizations for meeting and paper , handle errands 

Uck ate way too much today. barely eat tomorrow

Tomorrow goals:
- train robot again with rotation deployed and measure success more rigorously (just try it in bunch of places, move camera, etc), then try it with umi collected data on towel folding and then on a more dextrous, longer-horizon task
- visualizations for meeting
- gym, sunset walk, laundry, soomsoom dinner, pack for hotel sat morning 

Note 3d volume is also good unifier with wrist view camera, e.g. just project to both and take max over it or ignore if out of wrist frustum

Should have monster energy everyday
Need consistent sleep schedule NOW. It’s a waste of life to not have one. Testosterone production depends on it. CMON.

Need a better calibrated arm. The yam will be a good start while we 3d print our arm revision in parallel. Don’t learn onshape cad yet until you make a better arm our custom setup. Learn it after. 

Make sure points outside of frustum have 0 probability automatically

Should do a fast for a day or 2 days

Going to do a huge workout program during TRI, 7 days per week, 45 minutes lifting 45 minutes cardio 
I’m a fucking G. Genius robotics, jacked AF, charismatic and cool and collected. 
Get lean and strong now. Consistent bedtime. Be high quality.
Tomorrow will be 30 minute lift barbell rows and 30min stairmaster, after haircut before dinner.  
Dinner

X is not so attracted to you, that’s ok. Let’s refocus on ourselves now. Genius sexy jacked robotics phd scaler. 
Haircut today at 1pm, robot learning until then, 10k steps today and gym is rows and stairmaster, dinner with parents, more robot learning in evening after dinner 

Probably need to reset our gut biome 
Should have vlm auto annotate our datasets or at least provide initial annotations btw 

I wonder if we just trained in sim of simple object grasps at a wide variety of viewpoints, can we avoid the need for needing camera calibration, i.e. will it serve as densifying the 2D-action space and enable really fast transfer?

Don’t listen to what x says, you need to feel better about yourself independent, be cocky and arrogant even, because if not now then when
Haircut today is going to be a great refresher, clean shave tomorrow 	
Lose weight on intense TRI fitness program and 
Make visualizations and umi results before going to 

On remote ask backbones to do libero with ood pos and no-distractor libero dataset and add wrist cameras and explore geometry backbones, also ask Mac agent to add vlm for initial annotations that we can drag to modify

Sort out motion controller — can we replay a trajectory smoothly?

For TRI project should have the results be that it works with any combination of third view egocentric view or wrist view, whereas other methods don’t generalize without wrist view and are not data efficient 
We should also learn CAD with basic tutorials in next few days, first 8 hours of persistence will make you proficient btw especially given we know blender, need to have codex to hold our hand through it 
Should also have agents take on dataset interpolation fully on its own with requirement that it only uses the 48gb and just checking in with it every few days 
How can I plan to cut straight to a profitable robot company — I think it’s this banger paper (which includes dexterous complex results like folding laundry) while building our own robots 
Should also have it work with 

I don’t want to be the potential man forever. I’m tired of not having compelling robot demos for researchers and laypeople. Let’s make this summer internship amazing. Probably will end up buying a yam arm ourselves, meh fuck that let’s just grind onshape rq and build an arm with two sts3215s in the shoulder 
So this summer will be amazing, we’ll be amazing roboticist by the end of it with an amazing project and our own arms built, let’s go. 

Ok some good reflections
We don’t feel like a man because we see no path to being financially independent or it’s so misty. Commit to building the robot arm and robot AI now. CAD is a must and we’re building that arm in the next few months and the great data-efficient algorithm we’re building now. 
Also need to be more disciplined and professional with our wake and diet habits. 

Need to reset gut biome, it affects your brain and energy levels too. Fast for first 6 hours, sleep early during TRI 

Need to build our own robot in onshape in the next few weeks btw 

Today todos:
- walk 1 hr, gym
- meditate 10 more min
- clean room and pack car for TRI (so we can drive off from Izzy’s after car)
- drive to Izzy’s and Waymo over (arrive 10min early and budget 35min for travel so need to arrive at Izzy’s at 4:10 which means leave at 3:50)
- iron white shirt 

Hm I think claude might be right about onshape being a bad idea to start now and our excess cognitive energy / motivation / anxiety. We should give everything to making the project a banger now. I think redesigning the arm (not even double servo yet, just same arm but better calibration via bigger aruco boards, maybe spring loaded, then if time can add double servo version) and printing it in parallel. Cad is a few-week project because it involves becoming good at CAD (which is only a 1 week investment and a great one but not the right time for it, save it for after project we do it in CAD proper, add mobile base, etc) 

So timeline is first month - make paper a banger and print new robot arm (also so we can do experiments with it when at home), second month - aesthetics and wrap-up and more dextrous experiment fine-tuning, umi-design for YAMs (totally doable for us btw), third month is more wrap-up and if easy coasting then experiment with CAD-ing. Month after paper back at school we dedicate 100% to redoing the arm in proper onshape CAD, then the next month to adding a linear rail, then another moth wheels, then another month on tying it all together

Starting tomorrow we track calories and wear whoop to track steps and calories burned, no exceptions. Aiming for 1.5k calories consumed and 15k steps and 15min heavy lifting and 15min intense stairmaster.

Need to timecards and 

Looks like we will do morning 20min walk, get there early, gym at lunch 40min and shower here, stay here till 7 for cardio session at 9

Book 24hour fitness right after leaving today at 6pm, then grocery store after 

Ip for our box is 10.110.23.118

Should sign up for 24 hour fitness tonight and go for one hour in the day around 3pm, so schedule is walk from 720-740, get to toyota at 8am, lunch at 12 with friends, gym from 2-3 and shower here, keep rotisserie dinner in the fridge here from safeway, stay working here until 830, repeat, maybe going out with friends at 9 if we go out.

also instead of claude task should submit task, go walk / pace halls until observed done on phone


Ok 24hr fitness in, will get more steps tomorrow with claude and walk sessions, and 45min break of 25min lift 20min stairmaster cardio shower,

I wonder if we should try printing a fidex for the YAM base to make the calibration easier. 
Definitely try just running the calibration script ourselves carefully before even thinking about that, though
If time try printing it tomorrow though

Definitely can get yams cameras to be better by covering more diverse se3 coverage at calibration time, but at same time, just make outer coverage stl and ask Jonathan to print it (<1hr print) with top of square of yam as well, and if fits well then print it with full board 

Probably will need to take the extrinsics board shun uses and calibrate our own camera live 
Also will likely need fidex board, no chance we do the extensive camera calibration if we want to frequently move the camera and recalibrate and get new predictions for it 
30 days to lock in for diet. My body doesn’t feel tired enough. Triceps are not being worked enough. Tomorrow is tricep killer pulldown day and dips, +10min stairmaster and shower, then second session in day is 30more min cardio and sauna around 9pm 
Traps and triceps tricep pulldowns and shrugs  tomorrow

Also can we start heavy squatting again soon

Get here at 8am, want to be first in and last out
10 min meditation at 10am

The raiden calibration script is nice but not sure how accurate, good idea to calibrate all at once, but the convert script takes a long time for some reason

Next up is make the data viewer for the dataset

Calibration looks okay but not great with raiden cams. I think we’ll end up redoing it ourselves and that’s okay
Next up is sample training run on dataset and also should deploy a dataset and test that we have the IK working well too. 

Want a quiet wife

Also how are we currently handling points that our out of frustum e.g. when the point isn’t in the wrist camera’s frustum?
Do we have the correct workspace volume reach? Maybe we should just unproject each point in the image with N height bins and filter out points that are behind the robot 
Next up while training, have it back out the 3D actions and rotation and gripper to 

This is kinda like an implicit field btw 

Ok basically done eating for the day. Light yogurt (stop at Safeway or TJs) and protein powder (in car) after workout 

Really really frustrating having to go through someone to get the 3d prints, bottlenecked by someone and don’t know where they are and can’t walk into the lab to run it ourselves, definitely looking forward to tomorrow in lab access tutorial to get around this. 

Good dinner we had and maybe will have everyday — yogurt with peanut butter packet and barbell protein bar for texture and protein - ~500cal

Tomorrow after 4pm we will have access to the 3d printers and can print all we want, so in the next few days we finish the fidex for the yam and control it perfectly and run our models on it perfectly. Then in parallel we also redesign our own robot arm (motors are in the car and we should order more)

Check onboarding tasks in workday tonight?


Ok late start today but happy to be well rested, we can stay here until 8pm tonight with gym session in between and parfait dinner
Do onboarding today

jiaho is nice. Great experience in explaining it to him to see where the flaws of the explanation are. Probably want to show intermediate feature volume and supervision with cross entropy 

Should redo visualization to make it frustum-aligned, the discrete volume is off-putting / inelegant. I think we should even show the volume as a pointcloud with opacity weighted by the heatmap prediction 

Prob should make lunch faster or eat tat desk, losing 1hr to it, not good 

Need to sign workday for benefits today

Ok great first inference loop ran, obviously didn’t work well with 10 episodes and bad recorded data (occluded object), will collect data again with ~40 episodes and retrain, just with one arm and paying attention to observability of manipulating cube 

Need to start eating more real foods, don’t get tempted by this office snack slop. But also the lunches are usually pretty real food. 
Tomorrow main goal is to just collect more data and debug model if it’s not doing well, currently only have 11 episodes, should really go up to like 100

Don’t eat anything tonight. Feeding window way closed. 

Stop wearing headphones and always playing music
Might want to jump straight to wrist cameras for the yams and even just use our own robot arms for the single arm experiments, because the yams with their wrist cameras often occlude objects….? Though can teleop in way that doesn’t occlude I guess. Another thing is I want to move to umi style ASAP, it’s so much faster and easier to collect, so much less mental effort
Might want to also start working on the 3d volume visualizations, viewing the PCA plot in 3d and the opacity weighted by the attention response?
Another benefit of this approach is that it’s interpretable — you can see the distribution 

I honestly think the final implementation will be something simpler, no implicit feature map or volume 
The question is how do we formulate robot action as dense 2d image prediction — probably will be something like 2d heatmap then transformer takes 2d trajectory and predicts height and gripper and rotation trajectory, all these variations are mostly the same in the end,
We also might be able to train with in the wild data by training point tracks, e.g. click one per episode on gripper to annotate videos, 

Message all interns in slack tonight — anyway we can round them with their names all up?

New mod is Daniel Craig

Train neck at gym tomorrow 
Now that it's the weekend let's lock in on the cut. Barely eating and walking much more 
Ate too much so it's going to be night but got to figure out a better system. Can't come home burnt out and not expect something to happen so gotta figure something out 

Tomorrow morning let's take a walk and make the list of interns and propose a weekend hike with them 
Then for the paper or for the project let's just redo the Cup place task but make the place like a bowl or like a container instead of the card because the card is kind of weird. Let's just make the visualization really good. For example the visualizations might be:
	•	the video, the training videos all visualized
	•	doing a separate experiment with left first right table experiments
	•	with different cameras of course
Then let's also get the risk camera working too because that still needs to be de-risked 
So the goal for tomorrow then is to:
	•	de-risk the wrist camera setup
	•	improve calibration with the board that we already have
	•	ideally print a new board if the lab is open or we get access with our badge
	•	make better visualizations
So tomorrow let's do the wrist camera setup and redo the calibration and start making visualizations for the videos and let's have a good day. Don't be so rushed cause extra is a free day. Let's go to the gym in the day, let's come back early, have a nice relaxing night, and make plans with people for the weekend 
asd
Also let's just plan the entire paper and make visualizations of what it should look like. For example with the occlusion experiment, that would be cool , And also if we have a 3D printer, let's design and print the arm with the detachable calibration boards 

Okay today TODOS
Diet fitness: morning office breakfast, lunch is some pickup, Dinner is eggs and lox. Workout is walk to gym for lift at 2pm, 30min tv stairmaster at 6pm.
Logistics: slack all interns for weekend meetup/socials. Done 
Work todos:
- derisk camera, redo calibration, demo policy again but with better setup
I wonder if we should collect a place egg in carton policy (probably need more than 15 demos) where we pickup egg, place it in container 
One note that may be useful for explaining the method or illustrating it is that the height dimension is completely separate from the image embedding technically in terms of the computation because the height positional encoding is a separate dot product 
I think for the overview demo, the mug pick and place is good to illustrate data efficiency, camera viewpoint robustness, and how to distribution. For the more dexterous tasks like closing the card and et cetera, you want to use more demos and maybe the wrist camera too 


The figure for the paper, for the methods figure, might have something like the query embedding and the height regression or the height dot product, which yields via argmax the 2D location with the 2D plus height you unproject with the intrinsics and camera into a 3D location xyz 

Should also have in the results figure with the comparison of the different methods. It should show an overview of the architectures of each. The cool thing is that they'll all go through the same dyno encoder and then just show the different ways of mapping to the XYZ:
	•	the direct XYZ regression
	•	XYZ in camera space
	•	untransformed with the camera
	•	a pixel line motion track, which is X Y plus height (which is pixel line but not a pixel selection)
	•	ours
e 

For the wrist camera setup we want to de-risk it first in terms of the projections and the extrinsics. We should make a quality check to verify that some points project plausibly, for example, onto the table as we don't have a depth dimension 

It's also not a big deal which to do you pick next because they already do it and this is technically a free day anyway. Let's make the most of it but don't stress over it. Just choose the one they have most energy for of course 
Also no rush on it but we should be printing our robot exoskeleton or our robot today and launching, for example, an overnight run and testing the Oruko holder clearance for it 


For the wrist camera setup we want to de-risk it first in terms of the projections and the extrinsics. We should make a quality check to verify that some points project plausibly, for example, onto the table as we don't have a depth dimension 

Note to derisk the camera we can plot a grid of points on the table from the wrist camera (e.g. uniformly while looking near an object, and then projecting the points onto the scene view to confirm they are close/tight).

let's gym at 5, but the deliverable for the day should be really tight calibration and ideally for wrist cameras too

The wrist camera calibration script should assume that the wrist camera is already calibrated. Users should be able to press the yellow button or the white button to capture the image of the Aruco board to save the offset 

Also should definitely be using multiple frames for the calibration and not just a single frame 

Also the pitch for the paper is actually pretty easy. You just say that it's an architecture change for the end-effector prediction, which is effectively is basically a 3D keypoint projection. All we do is factorize that 3D keypoint prediction it into a 2D pixel selection and then that gives us the ray that the 3D keypoint lies on. To pick a point along the ray we have a pick from another number of discretized buckets along that ray. That's a very simple change but it has some dramatic consequences in terms of generalization, because the image features generalize so well, selecting the local pixel generalizes really well, and predicting depth is okay but we make it even easier by predicting a camera-invariant height from the robot base along the ray instead of depth

do timesheet soon

I think for the bigger visualizations it will be great to show the height progression separately and the height reflection separately, with attention to the fall structure of the map in both dimensions 
I think that one of the reasons our brain feels kind of fried after a long day of work is that we keep having our context agent work on a task and then we just switch to some foreign entertainment, waiting for you to walk around until half most days are sparsely engaged mentally. I think if you're going to switch context, which is always between two things with the debugging thing, and then switch to paper writing and figure making rather than entertainment 

don't shave mustache