Sesame CSM Ceylia LoRA

LoRA finetune of Sesame CSM 1B on Ceylia voice.

Use speaker_id = 0 during inference.

Without context	With context

Training Parameters:

LoRA Rank: 32
LoRA Alpha: 64 (rsLoRA)
Learning Rate: 1e-4
Epochs: 25 (475 Steps)
Batch Size: 64

This LoRA targets the MLP Layers only and not Attention. This seems to perform better(??) in my experiments.

This one however is definitely overfit so you should stick to low ranks and 1-3 epochs.

The training was done on an NVIDIA L40S for about 2 hours.

Inference with Unsloth

Code Snippet (click to expand)

import torch
import soundfile as sf

from unsloth import FastModel
from transformers import CsmForConditionalGeneration

model, processor = FastModel.from_pretrained(
    model_name = "shb777/csm-ceylia-lora",
    max_seq_length= 2048,
    dtype = None, 
    auto_model = CsmForConditionalGeneration,
    load_in_4bit = False,
)

model.eval()

speaker_id = 0
text = "Sesame is a super cool TTS model which can be fine tuned with Unsloth."

conversation = [
    {"role": str(speaker_id), "content": [{"type": "text", "text": text}]},
]

enc = processor.apply_chat_template(
    conversation,
    tokenize=True,
    return_dict=True,
    return_tensors="pt",
)

enc = {k: v.to("cuda") for k, v in enc.items()}

audio_values = model.generate(
    **enc,
    max_new_tokens=125,
    output_audio=True
)

audio = audio_values[0].to(torch.float32).cpu().numpy()
sf.write("example_without_context.wav", audio, 24000)

Downloads last month: 52

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for shb777/csm-ceylia-lora

Base model

sesame/csm-1b

Finetuned

unsloth/csm-1b

Adapter

(27)

this model

shb777
/

csm-ceylia-lora

Sesame CSM Ceylia LoRA

Training Parameters:

Inference with Unsloth

Model tree for shb777/csm-ceylia-lora

Dataset used to train shb777/csm-ceylia-lora