Heng (Henry) Yu

I am a CS PhD student at Stanford University, advised by Prof. Ehsan Adeli in the STAI Lab. My research sits at the intersection of computer vision, 3D generative modeling, and AI for healthcare.

Before Stanford, I received my master's degree from the CMU Robotics Institute, where I worked on 3D vision with Prof. Laszlo Jeni. I also collaborated closely with Prof. Berkin Bilgic at Harvard Medical School on MRI reconstruction, and with Prof. Cheng Jin at Shanghai Jiao Tong University on medical vision.

I obtained my bachelor's degree from Tsinghua University, majoring in Automation with a second major in Economics and Management. My long-term goal is to build AI systems that are both technically strong and practically useful, especially in embodied perception, visual generation, and healthcare settings.

I enjoy working with motivated people across academia and industry. If you'd like to collaborate, feel free to reach out.

CV  /  Google Scholar  /  GitHub  /  LinkedIn

profile photo
Research Interests

I am primarily interested in video generation and understanding, world modeling, and Embodied AI. More broadly, I want to build visual intelligence systems that can model dynamic environments, understand how the world evolves over time, and support decision-making and interaction in open-world settings.

Selected News
  • Sep 2024: 4Real paper is accepted by NeurIPS 2024.
  • Feb 2024: CoGS paper is accepted by CVPR 2024.
  • Feb 2023: DyLiN paper is accepted by CVPR 2023.
  • Feb 2023: SubZero abstract is accepted by ISMRM 2023 as a power pitch.
  • Dec 2022: CoNFies paper is nominated as a best paper candidate.
Service

Reviewer for CVPR, ICCV, ECCV, NeurIPS, SIGGRAPH, MICCAI, ISBI, Computer Graphics Forum, and ISMRM.

Selected Publications

* indicates co-first author. Please see my Google Scholar for the full publication list.

4Real thumbnail 4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models
Heng Yu*, Chaoyang Wang*, Peiye Zhuang, Willi Menapace, Aliaksandr Siarohin, Junli Cao, László A. Jeni, Sergey Tulyakov, Hsin-Ying Lee
NeurIPS 2024
paper / project page

We propose 4Real, the first photorealistic text-to-4D scene generation pipeline.

CoGS thumbnail CoGS: Controllable Gaussian Splatting
Heng Yu, Joel Julin, Zoltan Adam Milacski, Koichiro Niinuma, László A. Jeni
CVPR 2024
paper / project page / code

CoGS enables controllable Gaussian Splatting for dynamic scenes with direct scene manipulation and real-time control.

DyLiN thumbnail DyLiN: Making Light Field Networks Dynamic
Heng Yu, Joel Julin, Zoltan Adam Milacski, Koichiro Niinuma, László A. Jeni
CVPR 2023
paper / project page / code / CMU RI News

DyLiN extends light field networks to dynamic, non-rigid scenes with strong visual fidelity and efficiency.

CoNFies thumbnail CoNFies: Controllable Neural Face Avatars
Heng Yu, Koichiro Niinuma, László A. Jeni
FG 2023 - Best Paper Award Finalist
paper / project page / code

CoNFies is a fully automatic controllable neural representation for face self-portraits.

SubZero thumbnail SubZero: Subspace Zero-Shot MRI Reconstruction
Heng Yu, Yamin Arefeen, Berkin Bilgic
ISMRM 2023 - Power Pitch
paper / code

SubZero improves subspace-based zero-shot self-supervised MRI reconstruction with a parallel architecture and attention mechanism.

eRAKI thumbnail eRAKI: Fast Robust Artificial Neural Networks for K-space Interpolation with Coil Combination and Joint Reconstruction
Heng Yu, Zijing Dong, Yamin Arefeen, Congyu Liao, Kawin Setsompop, Berkin Bilgic
ISMRM 2021 - Oral Presentation
paper / code

eRAKI accelerates RAKI by directly learning a coil-combined target for robust and efficient MRI reconstruction.

Nature Communications thumbnail Predicting Treatment Response from Longitudinal Images using Multi-task Deep Learning
Cheng Jin*, Heng Yu*, Jia Ke*, Peirong Ding*, Yongju Yi, Xiaofeng Jiang, Xin Duan, Jinghua Tang, Daniel T. Chang, Xiaojian Wu, Feng Gao, Ruijiang Li
Nature Communications 2021
paper / code

A multi-task deep learning framework for tumor segmentation and treatment response prediction from longitudinal medical images.

visitor map

Website template adapted from Jon Barron.