|
--- |
|
license: apache-2.0 |
|
language: de |
|
library_name: transformers |
|
tags: |
|
- text-to-speech |
|
- tts |
|
- german |
|
- chatterbox |
|
- voice-cloning |
|
- zero-shot |
|
- merged-model |
|
--- |
|
|
|
# Kartoffelbox-v0.1_0.65h2: A Merged German Chatterbox-TTS Model |
|
|
|
## Model Description |
|
|
|
This repository contains an experimental, **standalone** German Text-to-Speech model based on the [Chatterbox](https://github.com/resemble-ai/chatterbox) framework. |
|
|
|
This model is a **hybrid** created by merging two fine-tuned models: |
|
1. The well-known German TTS "patch" [SebastianBodza/Kartoffelbox-v0.1](https://huggingface.co/SebastianBodza/Kartoffelbox-v0.1). |
|
2. A custom model extensively fine-tuned on a large, diverse dataset of German voices (~12.000 samples). |
|
|
|
The goal was to create a robust, general-purpose German TTS model by combining the natural prosody of `Kartoffelbox` with a model trained on a wide variety of voices and data types. The final weights are a **65/35 merge**, favoring the custom-trained, multi-speaker model. Unlike patch-based models, this is a complete, self-contained model that can be loaded directly. |
|
|
|
**Key Features:** |
|
- **Language:** German |
|
- **Type:** Standalone, Multi-Speaker, Merged Hybrid Model |
|
- **Capabilities:** High-quality speech synthesis and Zero-Shot Voice Cloning for **variable German voices**. |
|
- **Robustness:** Specifically trained to handle numbers, dates, and other complex data formats. (which work some times :D) |
|
|
|
## How to Use the Model |
|
|
|
This is a complete model and does not require manual patching. You will need the `chatterbox` library from Resemble AI to run it. |
|
|
|
**1. Installation** |
|
|
|
```bash |
|
# Clone the official Chatterbox repository and install its dependencies |
|
git clone https://github.com/resemble-ai/chatterbox.git |
|
cd chatterbox |
|
pip install -e . |