WFM-Eval: Interpretable Error Diagnostics for Video World Models in Robotics
CVPR 2026 Workshops · Video World Models & FMEA
PhD student in Computer Science at UC Irvine, advised by Prof. Judy Hoffman.
Open to research internships, get in touch
I work toward generalizable, physically-grounded embodied intelligence: systems that perceive across modalities and act reliably in the world. Two commitments run through my work: models should generalize under distribution shift, and we should be able to tell when to trust them.
Lately I focus on generative world models for robot learning: using them to learn physically-grounded, data-efficient policies[VAM], and building interpretable evaluation to know when to trust them[WFM-Eval], since a generated video can look photorealistic while teaching a robot to grasp an object that isn't there. This builds on a foundation in OOD robustness[LatentDR], controllable synthetic data[SkyScenes], and multimodal / 3D grounding[SPOT][MLLM].

2018–2022
2021–2022

2022–2026

2026–Present
📍 At CVPR 2026, come check out our workshop papers
I'm at CVPR June 2–8. Email me to grab coffee and talk world models, VLMs, or robot learning.
WFM-Eval: Interpretable Error Diagnostics for Video World Models in Robotics
CVPR 2026 Workshops · Video World Models & FMEA
Generative Video Models for Robot Policy Learning
Under review
SPOT: Structured Prompting with Object-Centric Tokens for Open-World Scene Graphs
CVPR 2026 Workshops · MUSI & Visual Concepts
Extending Multimodal Large Language Models Beyond a Single Modality (Vision + Audio)
Preprint
SkyScenes: A Synthetic Dataset for Aerial Scene Understanding
ECCV 2024
Press: Georgia Tech News GT College of Computing GT News Center Mirage News
LatentDR: Improving Model Generalization with Sample-Aware Latent Degradation & Restoration
WACV 2024