Papers
arxiv:2502.07711

RenderBox: Expressive Performance Rendering with Text Control

Published on Feb 11, 2025
Authors:
,
,

Abstract

RenderBox is a unified framework for text-and-score controlled audio performance generation that uses a diffusion transformer architecture with cross-attention joint conditioning to produce expressive musical performances with high fidelity and controllability.

AI-generated summary

Expressive music performance rendering involves interpreting symbolic scores with variations in timing, dynamics, articulation, and instrument-specific techniques, resulting in performances that capture musical can emotional intent. We introduce RenderBox, a unified framework for text-and-score controlled audio performance generation across multiple instruments, applying coarse-level controls through natural language descriptions and granular-level controls using music scores. Based on a diffusion transformer architecture and cross-attention joint conditioning, we propose a curriculum-based paradigm that trains from plain synthesis to expressive performance, gradually incorporating controllable factors such as speed, mistakes, and style diversity. RenderBox achieves high performance compared to baseline models across key metrics such as FAD and CLAP, and also tempo and pitch accuracy under different prompting tasks. Subjective evaluation further demonstrates that RenderBox is able to generate controllable expressive performances that sound natural and musically engaging, aligning well with prompts and intent.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2502.07711 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2502.07711 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2502.07711 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.