DreamVideo

DreamVideo: Composing Your Dream Videos with Customized Subject and Motion

CVPR2024

Yujie Wei¹, Shiwei Zhang², Zhiwu Qing³, Hangjie Yuan⁴, Zhiheng Liu²,
Yu Liu², Yingya Zhang², Jingren Zhou², Hongming Shan¹

¹Fudan University, ²Alibaba Group,
³Huazhong University of Science and Technology, ⁴Zhejiang University

Customized generation using diffusion models has made impressive progress in image generation, but remains unsatisfactory in the challenging video generation task, as it requires the controllability of both subjects and motions. To that end, we present DreamVideo, a novel approach to generating personalized videos from a few static images of the desired subject and a few videos of target motion. DreamVideo decouples this task into two stages, subject learning and motion learning, by leveraging a pre-trained video diffusion model. The subject learning aims to accurately capture the fine appearance of the subject from provided images, which is achieved by combining textual inversion and fine-tuning of our carefully designed identity adapter. In motion learning, we architect a motion adapter and fine-tune it on the given videos to effectively model the target motion pattern. Combining these two lightweight and efficient adapters allows for flexible customization of any subject with any motion. Extensive experimental results demonstrate the superior performance of our DreamVideo over the state-of-the-art methods for customized video generation. We have now made the source code and models publicly available.

Overview: Summary of the Generated Videos

Video Customization with both Subjects and Motions

You can generate videos flexibly with any subject and any motion.

“a person is playing guitar”

“a dog^* is playing guitar”

“a sloth^* is playing guitar
on the moon”

“A monster^* is playing guitar
in the room”

“a person is lifting weights”

“a dog^* is lifting weights”

“a sloth^* is lifting weights”

“a monster^* is lifting weights
on the road”

Comparisons with baselines.

“a person is skateboarding, front view”

AnimateDiff

ModelScopeT2V

LoRA

DreamVideo (ours)

“a dog is skateboarding, front view”

“a dog eating grass from under the snow”

AnimateDiff

ModelScopeT2V

LoRA

DreamVideo (ours)

“a cat eating grass from under the sand”

“a person is surfing”

AnimateDiff

ModelScopeT2V

LoRA

DreamVideo (ours)

“a dog is surfing”

“a car running on the road”

AnimateDiff

ModelScopeT2V

LoRA

DreamVideo (ours)

“a dog running on the Place du Louvre”

Subject Customization

Comparisons with baselines.

Subject

Textual Inversion

Dreamix

DreamVideo (ours)

“a cat eating pizza”

Subject

Textual Inversion

Dreamix

DreamVideo (ours)

“a monster dancing to upbeat music”

Subject

Textual Inversion

Dreamix

DreamVideo (ours)

“a wolf running in the forest”

Comparisons with Custom Diffusion.

Subject

Custom Diffusion

DreamVideo (ours)

“a S^* dog wagging its tail”

“a dog^* wagging its tail”

Subject

Custom Diffusion

DreamVideo (ours)

“a S^* cat posing along the Great Wall”

“a cat^* posing along the Great Wall”

Subject

Custom Diffusion

DreamVideo (ours)

“a S^* dog eating pizza”

“a dog^* eating pizza”

More results.

“a girl^* walking across the street, front view”

“a duck^* running on the road”

“a tortoise^* exploring a forest”

“a dog wearing a sunglasses^*”

Motion Customization

Comparisons with baselines.

Motion

ModelScopeT2V

Tune-A-Video

DreamVideo (ours)

“a person is biking”

“a bear is biking”

Motion

ModelScopeT2V

Tune-A-Video

DreamVideo (ours)

“a bear walking on some rocks”

“a cat walking on a field full of flowers”

Motion

ModelScopeT2V

Tune-A-Video

DreamVideo (ours)

“a person is lifting weights”

“a bear is lifting weights”

Motion

ModelScopeT2V

Tune-A-Video

DreamVideo (ours)

“a car running on the road”

“a tiger running on Mars”

More results.

“a car running on the road”

“a cat running on the road”

“a woman riding a horse jumping over a fence”

“a tiger jumping over a fence, cartoon style”

“a person is playing guitar”

“a monkey is playing guitar”

BibTeX

@inproceedings{wei2024dreamvideo,
	title={DreamVideo: Composing Your Dream Videos with Customized Subject and Motion},
	author={Wei, Yujie and Zhang, Shiwei and Qing, Zhiwu and Yuan, Hangjie and Liu, Zhiheng and Liu, Yu and Zhang, Yingya and Zhou, Jingren and Shan, Hongming},
	booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
	pages={6537--6549},
	year={2024}
}

DreamVideo

Customized video generation with any subject and any motion.

dog

sloth

monster

dog

cat

dog

dog

cat

monster

wolf

dog

cat

dog

girl

duck

tortoise

sunglasses