Krishna Somandepalli, PhD | Google DeepMind

About Me

I am a Senior Research Engineer at Google DeepMind in New York City, where I develop AI systems that learn from multimodal, real-world data. My work integrates vision, audio, language, and structured signals to model and generate rich human-centered content.

Previously, I was with Google Research and received my PhD in Electrical and Computer Engineering from University of Southern California (Signal Analysis and Interpretation Lab). My academic background is rooted in multimodal representation learning and affective computing—bridging the gap between raw signals and human-level understanding.

Prior to my doctoral studies, I earned my Master's degree in Electrical Engineering from UC Santa Barbara and worked as a Junior Research Scientist at NYU Langone Medical Center. There, I developed statistical models to analyze functional brain networks using fMRI data, an experience that grounded my later work in decoding complex human signals.

Across all my work—from foundational models to recent AI agents—I'm driven by the goal of building AI that is technically robust, meaningful, and useful to society.

Research Areas

Multimodal & Agentic AI

Gemini 2.5 • Reasoning • Long-Context

I work on multimodal systems that can perceive complex environments, reason over long contexts, and act autonomously in real settings—grounding with vision/audio/language modalities and other structured signals. I controbute to the Gemini 2.5 family with a focus on multimodality.

Selected: Gemini 2.5

Generative Media & Expressivity

VideoPoet • Versatile Diffusion • Patents

I build generative models for video and audio where controllability is the core goal—timing, motion, style, and semantic fidelity. This includes VideoPoet (CVPR 2024 Best Paper Award) and Versatile Diffusion (NeurIPS 2024), with an emphasis on expressivity and temporal coherence.

Selected: VideoPoet • Versatile Diffusion • rich captioning • task-agnostic diffusion training • cross-modal emotion understanding

Computational Media Intelligence

Longitudinal Analysis • Computational Narratology • Scale

I develop methods to extract structure from large-scale media: event discovery, temporal organization, and narrative-level understanding across thousands of hours of video. This includes patented work on automated narrative analysis of movie scripts and frameworks for Computational Media Intelligence, featured by Google Research video series.

Selected: Computational Media Intelligence • automated movie script analysis • Google Research feature