DreamVideo

Customized video generation with any subject and any motion.

DreamVideo: Composing Your Dream Videos with Customized Subject and Motion

CVPR2024

Yujie Wei1, Shiwei Zhang2, Zhiwu Qing3, Hangjie Yuan4, Zhiheng Liu2,
Yu Liu2, Yingya Zhang2, Jingren Zhou2, Hongming Shan1

1Fudan University, 2Alibaba Group,
3Huazhong University of Science and Technology, 4Zhejiang University

Customized generation using diffusion models has made impressive progress in image generation, but remains unsatisfactory in the challenging video generation task, as it requires the controllability of both subjects and motions. To that end, we present DreamVideo, a novel approach to generating personalized videos from a few static images of the desired subject and a few videos of target motion. DreamVideo decouples this task into two stages, subject learning and motion learning, by leveraging a pre-trained video diffusion model. The subject learning aims to accurately capture the fine appearance of the subject from provided images, which is achieved by combining textual inversion and fine-tuning of our carefully designed identity adapter. In motion learning, we architect a motion adapter and fine-tune it on the given videos to effectively model the target motion pattern. Combining these two lightweight and efficient adapters allows for flexible customization of any subject with any motion. Extensive experimental results demonstrate the superior performance of our DreamVideo over the state-of-the-art methods for customized video generation. We have now made the source code and models publicly available.

Overview: Summary of the Generated Videos
Video Customization with both Subjects and Motions
You can generate videos flexibly with any subject and any motion.
Comparisons with baselines.
Subject Customization
Comparisons with baselines.
Comparisons with Custom Diffusion.
More results.
Motion Customization
Comparisons with baselines.
More results.
BibTeX
@inproceedings{wei2024dreamvideo,
title={DreamVideo: Composing Your Dream Videos with Customized Subject and Motion},
author={Wei, Yujie and Zhang, Shiwei and Qing, Zhiwu and Yuan, Hangjie and Liu, Zhiheng and Liu, Yu and Zhang, Yingya and Zhou, Jingren and Shan, Hongming},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={6537--6549},
year={2024}
}