Heng (Henry) Yu

I am a CS PhD student at Stanford University, advised by Prof. Ehsan Adeli in the STAI Lab. My research focuses on video generation and understanding, world modeling, and Embodied AI. I am also interested in their real-world applications, such as healthcare.

Before Stanford, I received my master's degree from the CMU Robotics Institute, where I worked on 3D vision with Prof. Laszlo Jeni. I also collaborated closely with Prof. Berkin Bilgic at Harvard Medical School on MRI reconstruction, and with Prof. Cheng Jin at Shanghai Jiao Tong University on medical vision.

I obtained my bachelor's degree from Tsinghua University, majoring in Automation with a second major in Economics and Management. My long-term goal is to build AI systems that are both technically strong and practically useful, especially in embodied perception, visual generation, and healthcare settings.

I enjoy working with motivated people across academia and industry. If you'd like to collaborate, feel free to reach out.

CV  /  Google Scholar  /  GitHub  /  LinkedIn

profile photo
Research Interests

I am primarily interested in video generation and understanding, world modeling, and Embodied AI. More broadly, I want to build visual intelligence systems that can model dynamic environments, understand how the world evolves over time, and support decision-making and interaction in open-world settings.

Selected News
  • Nov 2025: SocialGen is accepted by 3DV 2026.
  • Sep 2024: 4Real paper is accepted by NeurIPS 2024.
  • Feb 2024: CoGS paper is accepted by CVPR 2024.
  • Feb 2023: DyLiN paper is accepted by CVPR 2023.
  • Feb 2023: SubZero abstract is accepted by ISMRM 2023 as a power pitch.
  • Dec 2022: CoNFies paper is nominated as a best paper candidate.
Service

Reviewer for CVPR, ICCV, ECCV, NeurIPS, SIGGRAPH, MICCAI, ISBI, Computer Graphics Forum, and ISMRM.

Selected Publications

* indicates co-first author. Please see my Google Scholar for the full publication list.

SocialGen framework SocialGen: Modeling Multi-Human Social Interaction with Language Models
Heng Yu*, Juze Zhang*, Changan Chen, Tiange Xiang, Yusu Fang, Juan Carlos Niebles, Ehsan Adeli
3DV 2026
paper / project page

SocialGen is the first unified motion-language model for multi-human interactions, enabling state-of-the-art social motion modeling with a new representation, benchmark, and dataset.

4Real thumbnail 4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models
Heng Yu*, Chaoyang Wang*, Peiye Zhuang, Willi Menapace, Aliaksandr Siarohin, Junli Cao, László A. Jeni, Sergey Tulyakov, Hsin-Ying Lee
NeurIPS 2024
paper / project page

We propose 4Real, the first photorealistic text-to-4D scene generation pipeline.

CoGS thumbnail CoGS: Controllable Gaussian Splatting
Heng Yu, Joel Julin, Zoltan Adam Milacski, Koichiro Niinuma, László A. Jeni
CVPR 2024
paper / project page / code

CoGS enables controllable Gaussian Splatting for dynamic scenes with direct scene manipulation and real-time control.

DyLiN thumbnail DyLiN: Making Light Field Networks Dynamic
Heng Yu, Joel Julin, Zoltan Adam Milacski, Koichiro Niinuma, László A. Jeni
CVPR 2023
paper / project page / code / CMU RI News

DyLiN extends light field networks to dynamic, non-rigid scenes with strong visual fidelity and efficiency.

CoNFies thumbnail CoNFies: Controllable Neural Face Avatars
Heng Yu, Koichiro Niinuma, László A. Jeni
FG 2023 - Best Paper Award Finalist
paper / project page / code

CoNFies is a fully automatic controllable neural representation for face self-portraits.

SubZero thumbnail SubZero: Subspace Zero-Shot MRI Reconstruction
Heng Yu, Yamin Arefeen, Berkin Bilgic
ISMRM 2023 - Power Pitch
paper / code

SubZero improves subspace-based zero-shot self-supervised MRI reconstruction with a parallel architecture and attention mechanism.

eRAKI thumbnail eRAKI: Fast Robust Artificial Neural Networks for K-space Interpolation with Coil Combination and Joint Reconstruction
Heng Yu, Zijing Dong, Yamin Arefeen, Congyu Liao, Kawin Setsompop, Berkin Bilgic
ISMRM 2021 - Oral Presentation
paper / code

eRAKI accelerates RAKI by directly learning a coil-combined target for robust and efficient MRI reconstruction.

Nature Communications thumbnail Predicting Treatment Response from Longitudinal Images using Multi-task Deep Learning
Cheng Jin*, Heng Yu*, Jia Ke*, Peirong Ding*, Yongju Yi, Xiaofeng Jiang, Xin Duan, Jinghua Tang, Daniel T. Chang, Xiaojian Wu, Feng Gao, Ruijiang Li
Nature Communications 2021
paper / code

A multi-task deep learning framework for tumor segmentation and treatment response prediction from longitudinal medical images.

visitor map

Website template adapted from Jon Barron.