News
Looking for self-motivated interns on multi-modal image and video generation/editing.
Drop CV to me if interested.
- [2024.03]
Two papers are accepted by CVPR 2024.
- [2024.02]
Our technology has been shipped in the animation series "Poems of Timeless Acclaim", which is now being broadcasted on CCTV-1.
- [2024.01]
We release MagicMaker, an AI platform that supports image generation, editing and animation!
|
Research
I'm interested in image and video generation/editing, multi-modality learning and generation.
Here are some selected publications.
Please check the full list from google scholar.
*Equal contribution.
|
|
Make-It-Vivid: Dressing Your Animatable Biped Cartoon Characters from Text
Junshu Tang,
Yanhong Zeng,
Ke Fan,
Xuheng Wang,
Bo Dai,
Lizhuang Ma,
Kai Chen
CVPR, 2024
project page /
video /
arXiv /
code
We present Make-it-Vivid, the first attempt that can create plausible and consistent texture in UV space for 3D biped cartoon characters from text input within few seconds.
|
|
PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models
Yiming Zhang*,
Zhening Xing*,
Yanhong Zeng,
Youqing Fang,
Kai Chen
CVPR, 2024
project page /
video /
arXiv /
demo /
code
PIA can animate any images from personalized models by text while preserving high-fidelity details and unique styles.
|
|
A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting
Junhao Zhuang,
Yanhong Zeng,
Wenran Liu,
Chun Yuan,
Kai Chen
arxiv, 2023
project page /
video /
arXiv /
demo /
code
PowerPaint is the first versatile inpainting model that achieves SOTA in text-guided and shape-guided object inpainting, object removal, outpainting, etc.
|
|
Aggregated Contextual Transformations for High-Resolution Image Inpainting
Yanhong Zeng,
Jianlong Fu,
Hongyang Chao,
Baining Guo
TVCG, 2023
project page /
arXiv /
video 1 /
video 2 /
code
In AOT-GAN, we propose aggregated contextual transformations and a novel mask-guided GAN training strategy for high-resolution image inpaining.
|
|
Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions
Hongwei Xue*,
Tiankai Hang*,
Yanhong Zeng*,
Yuchong Sun*,
Bei Liu,
Huan Yang,
Jianlong Fu,
Baining Guo
CVPR, 2022
arXiv /
video /
code
We collect a large dataset which is the first high-resolution dataset including 371.5k hours of 720p videos and the most diversified dataset covering 15 popular YouTube categories.
|
|
Improving Visual Quality of Image Synthesis by A Token-based Generator with Transformers
Yanhong Zeng,
Huan Yang,
Hongyang Chao,
Jianbo Wang,
Jianlong Fu
NeurIPS, 2021
arXiv
We propose a token-based generator with Transformers for image synthesis. We present a new perspective by viewing this task as visual token generation, controlled by style tokens.
|
|
Learning Joint Spatial-Temporal Transformations for Video Inpainting
Yanhong Zeng,
Hongyang Chao,
Jianlong Fu
ECCV, 2020
project page /
arXiv /
video 1 /
more results /
code
We propose STTN, the first transformer-based model for high-quality image inpainting, setting a new state-of-the-art performance.
|
|
Learning Pyramid Context-Encoder Network for High-Quality Image Inpainting
Yanhong Zeng,
Hongyang Chao,
Jianlong Fu,
Baining Guo
CVPR, 2019
project page /
arXiv /
video /
code
We propose PEN-Net, the first work that is able to conduct both semantic and texture inpainting. To achieve this, we propose cross-layer attention transfer and pyramid filling strategy.
|
|
3D Human Body Reshaping with Anthropometric Modeling
Yanhong Zeng,
Hongyang Chao,
Jianlong Fu
ICIMCS, 2017
project page /
arXiv /
video /
code
We design a 3D human body reshaping system. It can take as input user's anthropometric measurements (e.g., height and weight) and generate a 3D human shape for the user.
|
|
MagicMaker
Project Owner, 2023.04 ~ 2024.01
MagicMaker is a user-friendly AI platform that enables seamless image generation, editing, and animation. It empowers users to transform their imagination into captivating cinema and animations with ease.
|
|
OpenMMLab/MMagic
Lead Core Maintainer, 2022.08 ~ 2023.08
OpenMMLab Multimodal Advanced, Generative, and Intelligent Creation Toolbox. Unlock the magic 🪄: Generative-AI (AIGC), easy-to-use APIs, awsome model zoo, diffusion models, for text-to-image generation, image/video restoration/enhancement, etc.
|
- Conference Reviewer:
CVPR, ICCV, ECCV, SIGGRAPH, ICML, ICLR, NeurIPS, AAAI.
- Journal Reviewer:
TIP, TVCG, TMM, TCSVT, PR.
- Tutorial Talk (ICCV 2023):
MMagic: Multimodal Advanced, Generative and Intelligent Creation
- Tutorial Talk (CVPR 2023):
Learning to Generate, Edit, and Enhance Images and Videos with MMagic
- Invited Talk:
Towards High-Quality Image Inpainting (Microsoft China Video Center on Bilibili Live 2019)
- Award:
ICML 2022 Outstanding Reviewer.
- Award:
National Scholarship in 2021 (Top 1% in SYSU).
- Award:
Outstanding Undergraduate Thesis in 2017.
- Award:
Outstanding Undergraduate in 2017.
- Award:
National Scholarship in 2016 (Top 1% in SYSU).
- Award:
First Prize Excellence Scholarship in 2013, 2014, 2015.
|