Yanhong Zeng 曾艳红

I am currently a researcher at Shanghai AI Laboratory in Shanghai, where I work on Generative AI for visual content generation (AIGC).

I received a PhD in Computer Science and Technology from Sun Yat-sen University in 2022, as a member of the joint PhD program with Microsoft Research Asia (MSRA). I was luckily advised by Prof. Hongyang Chao, Baining Guo and Jianlong Fu.

Email  /  Google Scholar  /  Twitter  /  Github  /  Linkedin

profile photo

News

Looking for self-motivated interns on multi-modal image and video generation/editing. Drop CV to me if interested.

  • [2024.03] Two papers are accepted by CVPR 2024.
  • [2024.02] Our technology has been shipped in the animation series "Poems of Timeless Acclaim", which is now being broadcasted on CCTV-1.
  • [2024.01] We release MagicMaker, an AI platform that supports image generation, editing and animation!

Research

I'm interested in image and video generation/editing, multi-modality learning and generation.
Here are some selected publications. Please check the full list from google scholar.

*Equal contribution.

Make-It-Vivid: Dressing Your Animatable Biped Cartoon Characters from Text
Junshu Tang, Yanhong Zeng, Ke Fan, Xuheng Wang, Bo Dai, Lizhuang Ma, Kai Chen
CVPR, 2024
project page / video / arXiv / code

We present Make-it-Vivid, the first attempt that can create plausible and consistent texture in UV space for 3D biped cartoon characters from text input within few seconds.

PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models
Yiming Zhang*, Zhening Xing*, Yanhong Zeng, Youqing Fang, Kai Chen
CVPR, 2024
project page / video / arXiv / demo / code

PIA can animate any images from personalized models by text while preserving high-fidelity details and unique styles.

A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting
Junhao Zhuang, Yanhong Zeng, Wenran Liu, Chun Yuan, Kai Chen
arxiv, 2023
project page / video / arXiv / demo / code

PowerPaint is the first versatile inpainting model that achieves SOTA in text-guided and shape-guided object inpainting, object removal, outpainting, etc.

Aggregated Contextual Transformations for High-Resolution Image Inpainting
Yanhong Zeng, Jianlong Fu, Hongyang Chao, Baining Guo
TVCG, 2023
project page / arXiv / video 1 / video 2 / code

In AOT-GAN, we propose aggregated contextual transformations and a novel mask-guided GAN training strategy for high-resolution image inpaining.

hdvila Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions
Hongwei Xue*, Tiankai Hang*, Yanhong Zeng*, Yuchong Sun*, Bei Liu, Huan Yang, Jianlong Fu, Baining Guo
CVPR, 2022
arXiv / video / code

We collect a large dataset which is the first high-resolution dataset including 371.5k hours of 720p videos and the most diversified dataset covering 15 popular YouTube categories.

Improving Visual Quality of Image Synthesis by A Token-based Generator with Transformers
Yanhong Zeng, Huan Yang, Hongyang Chao, Jianbo Wang, Jianlong Fu
NeurIPS, 2021
arXiv

We propose a token-based generator with Transformers for image synthesis. We present a new perspective by viewing this task as visual token generation, controlled by style tokens.

Learning Joint Spatial-Temporal Transformations for Video Inpainting
Yanhong Zeng, Hongyang Chao, Jianlong Fu
ECCV, 2020
project page / arXiv / video 1 / more results / code

We propose STTN, the first transformer-based model for high-quality image inpainting, setting a new state-of-the-art performance.

Learning Pyramid Context-Encoder Network for High-Quality Image Inpainting
Yanhong Zeng, Hongyang Chao, Jianlong Fu, Baining Guo
CVPR, 2019
project page / arXiv / video / code

We propose PEN-Net, the first work that is able to conduct both semantic and texture inpainting. To achieve this, we propose cross-layer attention transfer and pyramid filling strategy.

3D Human Body Reshaping with Anthropometric Modeling
Yanhong Zeng, Hongyang Chao, Jianlong Fu
ICIMCS, 2017
project page / arXiv / video / code

We design a 3D human body reshaping system. It can take as input user's anthropometric measurements (e.g., height and weight) and generate a 3D human shape for the user.

Projects

MagicMaker

Project Owner, 2023.04 ~ 2024.01

MagicMaker is a user-friendly AI platform that enables seamless image generation, editing, and animation. It empowers users to transform their imagination into captivating cinema and animations with ease.
OpenMMLab/MMagic

Lead Core Maintainer, 2022.08 ~ 2023.08

OpenMMLab Multimodal Advanced, Generative, and Intelligent Creation Toolbox. Unlock the magic 🪄: Generative-AI (AIGC), easy-to-use APIs, awsome model zoo, diffusion models, for text-to-image generation, image/video restoration/enhancement, etc.

Working Experience

Shanghai AI Laboratory

Researcher, 2022.08 ~ Present


Microsoft Research Asia (MSRA)

Research Intern, 2018.06 ~ 2021.12

Research Intern, 2016.06 ~ 2017.06


Miscellanea