I’m a Ph.D. student in Computer Science at Georgia Tech, where I am fortunate to be advised by Prof. Judy Hoffman.

Recent Projects
  • World Models (WFM): Developing VLM-based benchmarks for automatic evaluation of world models.
  • 7B Open-Source VLM: Open-vocabulary 3D scene graph generation (under review).
  • SkyScenes: Synthetic aerial dataset for real-world segmentation (ECCV 2024).
  • Generalist Multimodal LLM: Jointly-trained vision-audio model that reduces cross-modal interference and outperforms larger models.

My research advances vision-language models (VLMs) by extending their capabilities across modalities, spatial reasoning, and evaluationβ€”integrating audio, enhancing spatial understanding, and enabling automatic evaluation of world models for robotic manipulation. I have also worked on syn-to-real transfer and domain generalization.

πŸ’Ό I'm currently seeking research internships for Summer 2026 β€” feel free to reach out if you're hiring!

πŸ“ Recent Updates [ 🌟: Highlight Β |Β  πŸ’‘: Research Β |Β  πŸ“†: Misc ]

View more

πŸ“š Publications

2024

ECCV 2024 (First first-author paper!)

SkyScenes: A Synthetic Dataset for Aerial Scene Understanding
Sahil Khose*, Anisha Pal*, Aayushi Agarwal*, Deepanshi*, Judy Hoffman, Prithvijit ChattopadhyaySkyScenes Sample

WACV 2024 (First main-conference paper!)

LatentDR: Improving Model Generalization Through Sample-Aware Latent Degradation and Restoration
Ran Liu, Sahil Khose, Jingyun Xiao, Lakshmi Sathidevi, Keerthan Ramnath, Zsolt Kira, Eva L. DyerLatentDR Sample

2022

NeurIPS 2022 (First in-person conference!)

ICML 2022 (Best Paper Award 🌟)

ACL 2022


2021

NeurIPS 2021

NAACL 2021 (Top Performer Award 🌟)