Yukai Shi (石瑜恺)
I am a Ph.D. student (2022-present) at Tsinghua University , supervised by Prof. Heung-Yeung Shum (former Executive Vice President of Microsoft). Previously, I received my bachelor's degree from the School of Artificial Intelligence at Xidian University .
I am also advised by Prof. Lei Zhang at IDEA Research , working closely with Prof. Ping Tan . I also work closely with Dr. Xin Tao , mainly focusing on data balancing and dynamic physics during video generation pretraining.
My research interests lie in video generation and 3D generation. Welcome to contact me for any discussion and cooperation!
Email: shiyk22 AT mails Dot tsinghua Dot edu Dot cn / shiyukai22 AT gmail Dot com .
Email  / 
Google Scholar  / 
Github  / 
Your browser does not support the video tag.
SceneMaker: Open-set 3D Scene Generation with Decoupled De-occlusion and Pose Estimation Model
Yukai Shi , Weiyu Li, Zihao Wang, Hongyang Li, Xingyu Chen, Ping Tan, Lei Zhang.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026)
· Further extend 3D scene generation from indoor to open-world domains.
Project page
/
Paper
/
Datasets
/
Code
Imbalance in Balance: Online Concept Balancing in Generation Models
Yukai Shi , Jiarong Ou, Rui Chen, Haotian Yang, Jiahao Wang, Xin Tao, Pengfei Wan, Di Zhang, Kun Gai.
IEEE/CVF International Conference on Computer Vision (ICCV 2025)
· A self-equalization training loss for visual generation models to balance data distribution.
Paper
/
Code
Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content
Qiuheng Wang*, Yukai Shi* , Jiarong Ou, Rui Chen, Ke Lin, Jiahao Wang, Boyuan Jiang, Haotian Yang, Mingwu Zheng, Xin Tao, Fei Yang, Pengfei Wan, Di Zhang
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2025)
· A novel video generation data curation pipeline considering data heterogeneity.
Project page
/
Paper
/
Code
Your browser does not support the video tag.
TOSS: High-quality Text-guided Novel View Synthesis from a Single Image
Yukai Shi* , Jianan Wang*, He Cao*, Boshi Tang, Xianbiao Qi, Tianyu Yang, Yukun Huang, Shilong Liu, Lei Zhang, Heung-Yeung Shum
International Conference on Learning Representations (ICLR 2024)
· Utilize text as semantic guidance to further constrain the solution space of Novel View Synthesis.
Project page
/
Paper
/
Code
DreamTime: An Improved Optimization Strategy for Diffusion-Guided 3D Generation
Yukun Huang, Jianan Wang, Yukai Shi , Boshi Tang, Xianbiao Qi, Lei Zhang
International Conference on Learning Representations (ICLR 2024)
Paper
Your browser does not support the video tag.
DreamWaltz: Make a Scene with Complex 3D Animatable Avatars
Yukun Huang, Jianan Wang, Ailing Zeng, He Cao, Xianbiao Qi, Yukai Shi , Zheng-Jun Zha, Lei Zhang
Conference on Neural Information Processing Systems (NeurIPS 2023)
Project page
/
Paper
/
Code
OccScene: Semantic Occupancy-based Cross-task Mutual Learning for 3D Scene Generation
Bohan Li, Xin Jin, Jianan Wang, Yukai Shi , Yasheng Sun, Xiaofeng Wang, Zhuang Ma, Baao Xie, Chao Ma, Xiaokang Yang, Wenjun Zeng.
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
Paper
Cococo: Improving text-guided video inpainting for better consistency, controllability and compatibility
Bojia Zi, Shihao Zhao, Xianbiao Qi, Jianan Wang, Yukai Shi , Qianyu Chen, Bin Liang, Rong Xiao, Kam-Fai Wong, Lei Zhang.
AAAI Conference on Artificial Intelligence (AAAI 2025)
Project page
/
Paper
/
Code
LipsFormer: Introducing Lipschitz Continuity to Vision Transformers
Xianbiao Qi, Jianan Wang, Yihao Chen, Yukai Shi , Lei Zhang,
International Conference on Learning Representations (ICLR 2023)
Paper /
Code
Academic Service
Conference reviewer of CVPR, NeurIPS, ICLR, ICML, ICCV, etc..