---
status: active
last-updated: 2026-04-06
---

# PhD
# PhD

## Overview
I am a Computer Science PhD student at USC. I am currently in the early-middle stage of my PhD and want to finish in roughly the next 2 years. My research sits at the intersection of **robotics, computer vision, and 3D scene understanding**, with a strong interest in building practical systems that improve robot learning and generalization.

My long-term motivation is not just to publish papers, but to build toward a future where I can create useful home robots and eventually have the freedom to pursue my own robotics ideas independently.

## Research Interests
My main interests are:

- Robot learning / imitation learning
- Vision-based robot policy learning
- Computer vision for robotics
- 3D vision and geometric representations
- Viewpoint-robust robot perception and action
- Data-efficient learning for robot policies
- Video models for robotics
- Injecting 3D priors into visual learning systems

A recurring theme in my work is that **geometry, 3D structure, and pixel-aligned representations can make robot learning more data efficient and more robust**.

## Main Research Direction
A major current direction is learning robot actions from vision using **pixel-aligned action representations** instead of directly regressing global end-effector poses.

The core intuition is:

- Global 3D action regression is often brittle and data-hungry.
- Pixel-aligned representations are more natural from the image’s perspective.
- They may generalize better across viewpoints and scene configurations.
- They may interface better with video models than standard action heads.

This has led to a project direction centered around predicting robot actions as **per-pixel distributions along camera rays**, rather than as a single global 3D pose vector.

## Current Project: PARA / Pixel-Aligned Robot Actions
One of my main active projects is a pixel-aligned robot action framework, sometimes referred to as **PARA**.

### Core idea
Instead of predicting the end-effector action directly in Cartesian coordinates, the model predicts a spatially aligned action representation over the image, such as depth buckets or distributions along each image ray.

### Why this matters
This project is motivated by several beliefs:

- Image-aligned prediction is often easier to learn from small robot datasets.
- It should be more robust to camera viewpoint changes.
- It preserves more local visual structure than global regression.
- It may make it easier to use pretrained vision/video models for robotics.

### Current status
This project is one of my most important near-term paper directions. I have been testing the idea in simulation and on real robot setups, and I care a lot about:

- data efficiency
- viewpoint generalization
- robustness to distractors
- compatibility with video models

## Video Models for Robotics
I am very interested in using **video models** for robotics, but I am skeptical of standard approaches that treat robotics as simple global action regression on top of a large video backbone.

What I find more promising is:

- using video models as rich visual feature extractors
- decoding their latent representations into pixel-aligned action outputs
- leveraging motion-aware latent structure without forcing full end-to-end video generation at deployment

I generally prefer approaches that are:

- smaller
- faster
- more practical
- more data-efficient
- easier to fine-tune on robotics tasks

I am especially interested in using video-model latents for action prediction in a way that preserves spatial detail.

## 3D Vision Background
I come from a strong 3D vision perspective, and this shapes how I think about robotics problems.

Important themes in my thinking include:

- structure from motion
- point clouds
- camera geometry
- rendering-based transformations
- multi-view consistency
- 3D priors for visual learning
- pose estimation and calibration

I often think that robotics models should make better use of geometry rather than relying purely on large-scale end-to-end learning.

## Calibration / Fiducials / Robot Geometry
Another important project direction is around **robot-camera calibration and fiducial-based robot state estimation**.

I have been exploring the idea of attaching designed fiducial structures or “exoskeletons” to robots so that camera pose or robot link pose can be recovered more directly and robustly from images.

This direction is attractive to me because it is:

- practical
- low-cost
- engineering-friendly
- useful for real robot deployment
- less dependent on large training datasets

This is one of my more systems-oriented research directions.

## Simulation and Real Robots
I like to work in both simulation and real-world robotics.

### Simulation
I use simulation to derisk ideas, run controlled evaluations, and scale experiments more quickly. I care about simulation as a place to test:

- viewpoint variation
- robustness
- data efficiency
- new policy architectures
- visual representation learning

### Real robots
I also care strongly about real robot validation. I want my work to matter in practice, not just in benchmark settings.

In general, I value research that can plausibly transfer from simulation to real systems and eventually to practical robotics products.

## Robotics Philosophy
A few beliefs that strongly shape my PhD work:

- **Data efficiency matters a lot.**
- **Geometry still matters.**
- **Pixel-aligned predictions are often more natural than global regression.**
- **Robotics needs methods that are practical, robust, and not absurdly large.**
- **A method should ideally work under viewpoint changes and in new environments.**
- **I prefer approaches that could eventually be useful on real robots outside of carefully controlled demos.**

## What I Am Trying to Build Toward
During my PhD, I want to develop research that contributes toward:

- better visual representations for robot control
- more robust policy learning from small datasets
- stronger use of 3D structure in robot learning
- more practical robot perception and calibration pipelines
- methods that could transfer into useful general-purpose home robotics systems

## Constraints / Reality
A few practical facts about my PhD situation:

- I care a lot about making steady research progress, even if infrastructure is imperfect.
- I often balance simulation work with real robot setup/debugging.
- I am interested in ambitious ideas, but I value implementations that are actually tractable.
- I do not want research plans that depend entirely on huge compute or giant datasets.
- I prefer concrete experiments and practical iteration over vague grandiosity.

## Useful Agent Guidance
When helping me with PhD-related tasks, it is useful to assume:

- I am strongest when combining **robotics + vision + geometry**.
- I respond well to ideas that are both **scientifically interesting and practically testable**.
- I usually prefer **clear experimental plans** over abstract brainstorming.
- I like research directions that could become papers, real systems, or startup-relevant technology.
- I often need help turning broad ideas into:
  - crisp problem statements
    - paper framing
      - experiments
        - ablations
          - baselines
            - implementation plans

            ## High-Value Help for Me
            The most helpful support for my PhD usually includes:

            - clarifying and sharpening research ideas
            - designing experiments
            - framing contributions for papers
            - comparing baselines fairly
            - identifying likely failure modes
            - helping with technical debugging in vision/robotics pipelines
            - organizing project plans and milestones
            - helping turn messy ideas into concrete narratives

            ## Summary
            I am a USC CS PhD student working at the intersection of **robotics, computer vision, and 3D geometry**, with a strong focus on **data-efficient robot learning**, **pixel-aligned action representations**, **video models for robotics**, and **practical robot calibration / perception systems**.

            The big picture is that I want to build research that is not only publishable, but genuinely useful for creating more capable and practical robots.