Chatterbox
Divehi
English
dhivehi-tts

ChatterboxTTS — Dhivehi (ދިވެހި)

This is a lightweight Dhivehi adaptation of Resemble AI’s Chatterbox, TTS that performs voice cloning from a short reference clip and exposes simple knobs—exaggeration, cfg_weight, and temperature—to steer expressiveness and pacing.

Although this checkpoint is tuned for Dhivehi, it can still speak English; with a clean 3–10s reference and sensible settings, results are often decent.

Install

pip install chatterbox-tts==0.1.4

download the chatterbox_dhivehi.py in this repo

Test

# Assumes chatterbox-tts==0.1.4 and a local chatterbox_dhivehi.py that adds Dhivehi support.

from chatterbox.tts import ChatterboxTTS
import chatterbox_dhivehi
from pathlib import Path
import torchaudio
import torch
import numpy as np
import random

# User settings (edit these)
CKPT_DIR = "/models/lab/whisper/chatterbox_test/kn_cbox"  # checkpoint dir
REF_WAV = "reference_audio.wav"                                              # optional 3–10s clean reference; "" to disable
#REF_WAV = ""
TEXT = "މި ރިޕޯޓާ ގުޅޭ ގޮތުން އެނިމަލް ވެލްފެއާ މިނިސްޓްރީން އަދި ވާހަކައެއް ނުދައްކާ"  # sample Dhivehi text
TEXT = f"{TEXT}, The Animal Welfare Ministry has not yet commented on the report" 
EXAGGERATION = 0.4
TEMPERATURE = 0.3
CFG_WEIGHT = 0.7
SEED = 42
SAMPLE_RATE = 24000
OUT_PATH = "out.wav"

# Extend Dhivehi support from local file
chatterbox_dhivehi.extend_dhivehi()

# Seed for reproducibility
torch.manual_seed(SEED)
if torch.cuda.is_available():
    torch.cuda.manual_seed(SEED)
    torch.cuda.manual_seed_all(SEED)
random.seed(SEED)
np.random.seed(SEED)

# Load model
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Loading ChatterboxTTS from: {CKPT_DIR} on {device}")
model = ChatterboxTTS.from_dhivehi(ckpt_dir=Path(CKPT_DIR), device=device)
print("Model loaded.")

# Generate (reference audio optional)
print(f"Generating audio... ref={'yes' if REF_WAV else 'no'}")
gen_kwargs = dict(
    text=TEXT,
    exaggeration=EXAGGERATION,
    temperature=TEMPERATURE,
    cfg_weight=CFG_WEIGHT,
)

try:
    if REF_WAV:
        gen_kwargs["audio_prompt_path"] = REF_WAV
        audio = model.generate(**gen_kwargs)
    else:
        # Try without reference first; if backend requires audio_prompt_path, fall back to ""
        try:
            audio = model.generate(**gen_kwargs)
        except TypeError:
            gen_kwargs["audio_prompt_path"] = ""
            audio = model.generate(**gen_kwargs)
except Exception as e:
    raise RuntimeError(f"Generation failed: {e}")

# Save
torchaudio.save(OUT_PATH, audio, SAMPLE_RATE)
dur = audio.shape[1] / SAMPLE_RATE
print(f"Saved {OUT_PATH} ({dur:.2f}s)")

Sample with no reference

  • Generated Audio:

Sample with reference

  • Reference Audio:

  • Generated Audio:

Note: English prompts also work with this finetune; quality improves with a clean, representative reference clip.

Settings & Tips

General use

  • Start with exaggeration=0.5, cfg_weight=0.5.
  • If the reference speaker is fast, lower cfg_weight to ~0.3 for calmer pacing. ([Chatterbox TTS API][2])

Expressive / dramatic

  • Use lower cfg_weight (~0.3) and higher exaggeration (≥0.7). Higher exaggeration tends to speed up delivery; reducing CFG compensates for pacing. ([Chatterbox TTS API][2])

Language transfer

  • Make the reference clip’s language match your target. If accent carry-over occurs, try cfg_weight=0. ([Chatterbox TTS API][2])

Additional

  • Reference audio: 3–10 seconds, clear, minimal background noise.
  • Fix a seed for reproducibility.
  • Pre-clean text (trim extra spaces/line breaks).

Known Limitations

  • This is a quick experimental run; expect occasional artifacts (prosody quirks, timing drift on long passages).
  • For long texts, consider sentence-level generation and concatenation.
  • Voice cloning quality is highly dependent on reference audio cleanliness.
Downloads last month
14
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for alakxender/chatterbox-tts-dhivehi

Finetuned
(13)
this model

Datasets used to train alakxender/chatterbox-tts-dhivehi

Spaces using alakxender/chatterbox-tts-dhivehi 2