arxiv:2412.11067

CFSynthesis: Controllable and Free-view 3D Human Video Synthesis

Published on Dec 15, 2024

Authors:

Abstract

CFSynthesis generates high-quality human videos with customizable attributes using a texture-SMPL-based representation and foreground-background separation, achieving state-of-the-art performance in complex human animations and 3D motions.

AI-generated summary

Human video synthesis aims to create lifelike characters in various environments, with wide applications in VR, storytelling, and content creation. While 2D diffusion-based methods have made significant progress, they struggle to generalize to complex 3D poses and varying scene backgrounds. To address these limitations, we introduce CFSynthesis, a novel framework for generating high-quality human videos with customizable attributes, including identity, motion, and scene configurations. Our method leverages a texture-SMPL-based representation to ensure consistent and stable character appearances across free viewpoints. Additionally, we introduce a novel foreground-background separation strategy that effectively decomposes the scene as foreground and background, enabling seamless integration of user-defined backgrounds. Experimental results on multiple datasets show that CFSynthesis not only achieves state-of-the-art performance in complex human animations but also adapts effectively to 3D motions in free-view and user-specified scenarios.