|
Yanhong Zeng ζΎθ³ηΊ’
Yanhong Zeng is a Research Scientist & Engineer at Ant Group, specializing in efficient generative systems.
Previously at Shanghai AI Lab, she served as the Lead Core Maintainer of MMagic. Her work bridges the gap between research and production, developing high-quality, controllable, and scalable multi-modal models, with a current focus on world models and streaming video generation.
π Hiring: looking for self-motivated interns to work on Generative AI!
Email /
Google Scholar /
Twitter /
Github /
Linkedin
|
|
News
- [2025.04] π I have joined Ant Research to start a new journey!
- [2025.03]
π DiffSensei and Auto-CherryPicker are accepted by CVPR 2025.
- [2024.09]
π HumanVid is accepted by NeurIPS 2024 (D&B Track).
- [2024.09]
π MotionBooth is accepted by NeurIPS 2024 (Spotlight).
- [2024.07]
π PowerPaint is accepted by ECCV 2024.
- [2024.03]
π PIA and Make-it-Vivid are accepted by CVPR 2024.
- [2024.02]
π₯ Our technology has been shipped in the animation series
"Poems of Timeless Acclaim",
which is broadcasted in over 10 languages and on more than 70 mainstream media platforms
overseas. It has reached an audience of nearly 100 million worldwide viewers within two weeks.
- [2024.01]
π₯ We release MagicMaker,
an AI platform that supports image generation, editing and animation!
- [2023.12]
We release MMagic, a multimodal advanced, generative, and intelligent creation toolbox.
|
|
|
Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation
Yunhong Lu,
Yanhong Zeng†,
Haobo Li,
Hao Ouyang,
Qiuyu Wang,
Ka Leong Cheng,
Jiapeng Zhu,
Hengyuan Cao,
Zhipeng Zhang,
Xing Zhu,
Yujun Shen,
Min Zhang
Arxiv, 2025
project page /
arXiv /
code
Reward Forcing is a new real-time streaming video generation framework with novel memory design and a rewarded distribution matching distillation method for better dynamic generation.
|
|
|
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds
Yiming Zhang,
Yicheng Gu,
Yanhong Zeng†,
Zhening Xing,
Yuancheng Wang,
Zhizheng Wu,
Kai Chen
IJCV, 2025
project page /
video /
arXiv /
demo /
code
FoleyCrafter is a text-based video-to-audio generation framework which can generate high-quality audios that are semantically relevant and temporally synchronized with the input videos.
|
|
|
Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models
Zhening Xing,
Gereon Fox,
Yanhong Zeng†,
Xingang Pan†,
Mohamed Elgharib,
KChristian Theobalt ,
Kai Chen
Arxiv, 2024
project page /
video /
arXiv /
demo /
code
Live2Diff is the first attempt that enables uni-directional attention modeling to video diffusion models for live video steam processing, and achieves 16FPS on RTX 4090 GPU.
|
|
|
A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting
Junhao Zhuang,
Yanhong Zeng†,
Wenran Liu,
Chun Yuan,
Kai Chen
ECCV, 2024
project page /
video /
arXiv /
demo /
code
PowerPaint is the first versatile inpainting model that achieves SOTA in text-guided and shape-guided object inpainting, object removal, outpainting, etc.
|
|
|
PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models
Yiming Zhang*,
Zhening Xing*,
Yanhong Zeng†,
Youqing Fang,
Kai Chen
CVPR, 2024
project page /
video /
arXiv /
demo /
code
PIA can animate any images from personalized models by text while preserving high-fidelity details and unique styles.
|
|
|
Aggregated Contextual Transformations for High-Resolution Image Inpainting
Yanhong Zeng,
Jianlong Fu,
Hongyang Chao,
Baining Guo
TVCG, 2023
project page /
arXiv /
video 1 /
video 2 /
code
In AOT-GAN, we propose aggregated contextual transformations and a novel mask-guided GAN training strategy for high-resolution image inpaining.
|
|
Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions
Yanhong Zeng*,
Hongwei Xue*,
Tiankai Hang*,
Yuchong Sun*,
Bei Liu,
Huan Yang,
Jianlong Fu,
Baining Guo
CVPR, 2022
arXiv /
video /
code
We collect a large dataset which is the first high-resolution dataset including 371.5k hours of 720p videos and the most diversified dataset covering 15 popular YouTube categories.
|
|
Improving Visual Quality of Image Synthesis by A Token-based Generator with Transformers
Yanhong Zeng,
Huan Yang,
Hongyang Chao,
Jianbo Wang,
Jianlong Fu
NeurIPS, 2021
arXiv
We propose a token-based generator with Transformers for image synthesis. We present a new perspective by viewing this task as visual token generation, controlled by style tokens.
|
|
Learning Joint Spatial-Temporal Transformations for Video Inpainting
Yanhong Zeng,
Hongyang Chao,
Jianlong Fu
ECCV, 2020
project page /
arXiv /
video 1 /
more results /
code
We propose STTN, the first transformer-based model for high-quality image inpainting, setting a new state-of-the-art performance.
|
|
Learning Pyramid Context-Encoder Network for High-Quality Image Inpainting
Yanhong Zeng,
Hongyang Chao,
Jianlong Fu,
Baining Guo
CVPR, 2019
project page /
arXiv /
video /
code
We propose PEN-Net, the first work that is able to conduct both semantic and texture inpainting. To achieve this, we propose cross-layer attention transfer and pyramid filling strategy.
|
 |
CCTV Animation Production: "Poems of Timeless Acclaim"
Tech Lead
Designed and delivered an end-to-end AI animation pipeline for a national-scale production. Achieved global impact with broadcast in 10+ languages across 70+ platforms, amassing 100M+ views.
|
 |
MagicMaker
Product Owner & Tech Lead
MagicMaker is a user-friendly AI platform that enables seamless image generation, editing, and animation. It empowers users to transform their imagination into captivating cinema and animations with ease.
|
 |
OpenMMLab/MMagic
Lead Core Maintainer
OpenMMLab Multimodal Advanced, Generative, and Intelligent Creation Toolbox. Unlock the magic πͺ: Generative-AI (AIGC), easy-to-use APIs, awesome model zoo, diffusion models, for text-to-image generation, image/video restoration/enhancement, etc.
|
- Conference Reviewer:
CVPR, ICCV, ECCV, SIGGRAPH, ICML, ICLR, NeurIPS, AAAI.
- Journal Reviewer:
TIP, TVCG, TMM, TCSVT, PR.
- Tutorial Organizer (ICCV 2023):
MMagic: Multimodal Advanced, Generative and Intelligent Creation
- Tutorial Organizer (CVPR 2023):
Learning to Generate, Edit, and Enhance Images and Videos with MMagic
- Invited Talk:
Towards High-Quality Image Inpainting (Microsoft China Video Center on Bilibili Live 2019)
- Award:
ICML 2022 Outstanding Reviewer.
- Award:
National Scholarship in 2021 (Top 1% in SYSU).
- Award:
Outstanding Undergraduate Thesis in 2017.
- Award:
Outstanding Undergraduate in 2017.
- Award:
National Scholarship in 2016 (Top 1% in SYSU).
- Award:
First Prize Excellence Scholarship in 2013, 2014, 2015.
|