Sesame CSM Ceylia LoRA
LoRA finetune of Sesame CSM 1B on Ceylia
voice.
Use
speaker_id
=0
during inference.
Without context | With context |
---|---|
Training Parameters:
- LoRA Rank: 32
- LoRA Alpha: 64 (rsLoRA)
- Learning Rate: 1e-4
- Epochs: 25 (475 Steps)
- Batch Size: 64
This LoRA targets the MLP Layers only and not Attention. This seems to perform better(??) in my experiments.
This one however is definitely overfit so you should stick to low ranks and 1-3 epochs.
The training was done on an NVIDIA L40S for about 2 hours.
Inference with Unsloth
Code Snippet (click to expand)
import torch
import soundfile as sf
from unsloth import FastModel
from transformers import CsmForConditionalGeneration
model, processor = FastModel.from_pretrained(
model_name = "shb777/csm-ceylia-lora",
max_seq_length= 2048,
dtype = None,
auto_model = CsmForConditionalGeneration,
load_in_4bit = False,
)
model.eval()
speaker_id = 0
text = "Sesame is a super cool TTS model which can be fine tuned with Unsloth."
conversation = [
{"role": str(speaker_id), "content": [{"type": "text", "text": text}]},
]
enc = processor.apply_chat_template(
conversation,
tokenize=True,
return_dict=True,
return_tensors="pt",
)
enc = {k: v.to("cuda") for k, v in enc.items()}
audio_values = model.generate(
**enc,
max_new_tokens=125,
output_audio=True
)
audio = audio_values[0].to(torch.float32).cpu().numpy()
sf.write("example_without_context.wav", audio, 24000)
- Downloads last month
- 52
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support