Spaces:
Running
Running
<div align="center"> | |
<img src="https://storage.googleapis.com/hume-public-logos/hume/hume-banner.png"> | |
<h1>Hume AI | Expressive TTS Arena</h1> | |
<p> | |
<strong> An interactive platform for comparing and evaluating the expressiveness of different text-to-speech models </strong> | |
</p> | |
</div> | |
## Overview | |
Expressive TTS Arena is an open-source web application that enables users to compare text-to-speech outputs with a focus on expressiveness rather than just audio quality. Built with [Gradio](https://www.gradio.app/), it provides a seamless interface for generating and comparing speech synthesis from different providers, including Hume AI and ElevenLabs. | |
## Prerequisites | |
- Python >=3.11.11 | |
- pip >=25.0 | |
- uv >=0.5.29 | |
- API keys for **Hume AI**, **Anthropic**, and **ElevenLabs** | |
## Project Structure | |
``` | |
Expressive TTS Arena/ | |
βββ src/ | |
β βββ assets/ | |
β β βββ styles.css # Defines custom css | |
β βββ database/ | |
β β βββ __init__.py # Makes database a package; expose ORM methods | |
β β βββ crud.py # Defines operations for interacting with database | |
β β βββ database.py # Sets up SQLAlchemy database connection | |
β β βββ models.py # SQLAlchemy database models | |
β βββ integrations/ | |
β β βββ __init__.py # Makes integrations a package; exposes API clients | |
β β βββ anthropic_api.py # Anthropic API integration | |
β β βββ elevenlabs_api.py # ElevenLabs API integration | |
β β βββ hume_api.py # Hume API integration | |
β βββ scripts/ | |
β β βββ __init__.py # Makes scripts a package | |
β β βββ init_db.py # Script for initializing database | |
β β βββ test_db.py # Script for testing database connection | |
β βββ __init__.py # Makes src a package | |
β βββ app.py # Entry file | |
β βββ config.py # Global config and logger setup | |
β βββ constants.py # Global constants | |
β βββ custom_types.py # Global custom types | |
β βββ theme.py # Custom Gradio Theme | |
β βββ utils.py # Utility functions | |
βββ static/ | |
β βββ audio/ # Directory for storing generated audio files | |
βββ .env.example | |
βββ .gitignore | |
βββ .pre-commit-config.yaml | |
βββ Dockerfile | |
βββ LICENSE.txt | |
βββ pyproject.toml | |
βββ README.md | |
βββ uv.lock | |
``` | |
## Installation | |
1. This project uses the [uv](https://docs.astral.sh/uv/) package manager. Follow the installation instructions for your platform [here](https://docs.astral.sh/uv/getting-started/installation/). | |
2. Configure environment variables: | |
- Create a `.env` file based on `.env.example` | |
- Add your API keys: | |
```txt | |
HUME_API_KEY=YOUR_HUME_API_KEY | |
ANTHROPIC_API_KEY=YOUR_ANTHROPIC_API_KEY | |
ELEVENLABS_API_KEY=YOUR_ELEVENLABS_API_KEY | |
``` | |
3. Run the application: | |
Standard | |
```sh | |
uv run python -m src.app | |
``` | |
With hot-reloading | |
```sh | |
uv run watchfiles "python -m src.app" src | |
``` | |
4. Test the application by navigating to the the localhost URL in your browser (e.g. `localhost:7860` or `http://127.0.0.1:7860`) | |
5. (Optional) If contributing, install pre-commit hook for automatic file formatting: | |
```sh | |
uv run pre-commit install | |
``` | |
## User Flow | |
1. **Choose or enter a character description**: Select a sample from the list or enter your own to guide text and voice generation. | |
2. **Generate text**: Click **"Generate Text"** to create dialogue based on the character. The generated text will appear in the input field automaticallyβedit it if needed. | |
3. **Synthesize speech**: Click **"Synthesize Speech"** to send your text and character description to two TTS APIs. Each API generates a voice and synthesizes speech in that voice. | |
4. **Listen & compare**: Play both audio options and assess their expressiveness. | |
5. **Vote for the best**: Click **"Select Option A"** or **"Select Option B"** to choose the most expressive output. | |
## License | |
This project is licensed under the MIT License - see the [LICENSE.txt](LICENSE.txt) file for details. | |