Accent Vector: Controllable Accent Manipulation for Multilingual TTS Without Accented Data
Abstract
Accent Vector enables controllable accent manipulation in multilingual TTS systems through fine-tuning on native speech from different languages and computing task vectors that capture accent characteristics.
Accent is an integral part of society, reflecting multiculturalism and shaping how individuals express identity. The majority of English speakers are non-native (L2) speakers, yet current Text-To-Speech (TTS) systems primarily model American-accented English due limited accented data. We propose Accent Vector, a controllable representation that enables accent manipulation in multilingual TTS without requiring accented training data. Accent Vector is derived by fine-tuning a TTS system on native speech of a different language (i.e. non-English) and computing task vectors capturing accent characteristics (i.e. in English). By scaling and interpolating the vector, we achieve fine-grained control over accent strength and generate mixed-accent speech. In addition, it generalizes beyond English, enabling accent control across multiple languages. Objective and human evaluations confirm the effectiveness of Accent Vector for fine-grained and compositional accent control.
Community
Accent is an integral part of society, reflecting multiculturalism and shaping how individuals express identity. The majority of English speakers are non-native (L2) speakers, yet current Text-To-Speech (TTS) systems primarily model American-accented English due limited accented data. We propose Accent Vector, a controllable representation that enables accent manipulation in multilingual TTS without requiring accented training data. Accent Vector is derived by fine-tuning a TTS system on native speech of a different language (i.e. non-English) and computing task vectors capturing accent characteristics (i.e. in English). By scaling and interpolating the vector, we achieve fine-grained control over accent strength and generate mixed-accent speech. In addition, it generalizes beyond English, enabling accent control across multiple languages. Objective and human evaluations confirm the effectiveness of Accent Vector for fine-grained and compositional accent control.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Learning-free L2-Accented Speech Generation using Phonological Rules (2026)
- Quantifying Speaker Embedding Phonological Rule Interactions in Accented Speech Synthesis (2026)
- Activation Steering for Accent Adaptation in Speech Foundation Models (2026)
- CosyAccent: Duration-Controllable Accent Normalization Using Source-Synthesis Training Data (2026)
- Activation Steering for Accent-Neutralized Zero-Shot Text-To-Speech (2026)
- Rethinking Discrete Speech Representation Tokens for Accent Generation (2026)
- LLM-to-Speech: A Synthetic Data Pipeline for Training Dialectal Text-to-Speech Models (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper