Model Card for Model ID

Model Details

Model Description

This model is a fine-tuned version of csm-1B for medical text-to-speech tasks. It was trained on a curated dataset of ~2,000 medical text-to-speech pairs, focusing on clinical terminology, healthcare instructions, and patient–doctor communication scenarios.

  • Fine-tuned for: Medical-domain text-to-speech synthesis
  • Language(s) (NLP): English
  • License: MIT
  • Finetuned from model : csm-1b

Uses

Direct Use

  • Generating synthetic speech from medical text for research, prototyping, and educational purposes
  • Assisting in medical transcription-to-speech applications
  • Supporting voice-based healthcare assistants

Bias, Risks, and Limitations

  • The model is not a substitute for professional medical advice.
  • Trained on a relatively small dataset (~2K samples) → performance may be limited outside the fine-tuned domain.
  • Bias & hallucinations: The model may mispronounce rare terms or produce inaccurate speech in critical scenarios.
  • Should not be used in real clinical decision-making without proper validation.

How to Get Started with the Model

Use the code below to get started with the model.

import torch
from transformers import CsmForConditionalGeneration, AutoProcessor
import soundfile as sf
from peft import PeftModel


model_id = "unsloth/csm-1b"
device = "cuda" if torch.cuda.is_available() else "cpu"


processor = AutoProcessor.from_pretrained(model_id)
base_model = CsmForConditionalGeneration.from_pretrained(model_id, device_map=device)

model = PeftModel.from_pretrained(base_model, "khazarai/Medical-TTS")

text = "Mild dorsal angulation of the distal radius reflective of the fracture."

speaker_id = 0

conversation = [
    {"role": str(speaker_id), "content": [{"type": "text", "text": text}]},
]
audio_values = model.generate(
    **processor.apply_chat_template(
        conversation,
        tokenize=True,
        return_dict=True,
    ).to("cuda"),
    max_new_tokens=650, 
    # play with these parameters to tweak results
    # depth_decoder_top_k=0,
    # depth_decoder_top_p=0.9,
    # depth_decoder_do_sample=True,
    # depth_decoder_temperature=0.9,
    # top_k=0,
    # top_p=1.0,
    # temperature=0.9,
    # do_sample=True,
    #########################################################
    output_audio=True
)
audio = audio_values[0].to(torch.float32).cpu().numpy()
sf.write("example.wav", audio, 24000)

Framework versions

  • PEFT 0.15.2
Downloads last month
9
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for khazarai/Medical-TTS

Base model

sesame/csm-1b
Finetuned
unsloth/csm-1b
Adapter
(27)
this model

Dataset used to train khazarai/Medical-TTS

Collection including khazarai/Medical-TTS