Hume AI | Expressive TTS Arena

An interactive platform for comparing and evaluating the expressiveness of different text-to-speech engines

Overview

Expressive TTS Arena is an open-source web application that enables users to compare text-to-speech outputs with a focus on expressiveness rather than just audio quality. Built with Gradio, it provides a seamless interface for generating and comparing speech synthesis from different providers, including Hume AI and ElevenLabs.

Features

Text generation using Claude AI for creating expressive content.
Direct text input or AI-assisted text generation.
Comparative analysis of different TTS engines.
Simple voting mechanism for preferred outputs.
Random voice selection from multiple providers.
Real-time speech synthesis comparison.

Prerequisites

Python >=3.11.11
pip >=25.0
Virtual environment capability
API keys for Hume AI, Anthropic, and ElevenLabs
For a complete list of dependencies, see requirements.

Project Structure

Expressive TTS Arena/
├── src/
│   ├── integrations/
│   │   ├── __init__.py         # Makes integrations a package; exposes API clients
│   │   ├── anthropic_api.py    # Anthropic API integration
│   │   ├── elevenlabs_api.py   # ElevenLabs API integration
│   │   └── hume_api.py         # Hume API integration
│   ├── __init__.py             # Makes src a package; exposes key functionality
│   ├── app.py                  # Entry file
│   ├── config.py               # Global config and logger setup
│   ├── constants.py            # Global constants
│   ├── theme.py                # Custom Gradio Theme
│   └── utils.py                # Utility functions
├── .env.example
├── .gitignore
├── .pre-commit-config.yaml
└── requirements.txt

Installation

Create and activate the virtual environment:

Mac/Linux

python -m venv gradio-env
source gradio-env/bin/activate

Windows

python -m venv gradio-env
gradio-env\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```
(Optional) If contributing, install pre-commit hook for automatic file formatting:
```
pre-commit install
```

Configure environment variables:

Create a .env file based on .env.example
Add your API keys:

HUME_API_KEY=YOUR_HUME_API_KEY
ANTHROPIC_API_KEY=YOUR_ANTHROPIC_API_KEY
ELEVENLABS_API_KEY=YOUR_ELEVENLABS_API_KEY

Run the application:
```
watchfiles "python -m src.app"
```
Test the application by navigating to the the localhost URL in your browser (e.g. localhost:7860 or http://127.0.0.1:7860)

User Flow

Enter or Generate Text: Type directly in the Text box, or optionally enter a Prompt, click "Generate text", and edit if needed.
Synthesize Speech: Click "Synthesize speech" to generate two audio outputs.
Listen & Compare: Playback both options (A & B) to hear the differences.
Vote for Your Favorite: Click "Vote for option A" or "Vote for option B" to choose your favorite.

License

This project is licensed under the MIT License - see the LICENSE.txt file for details.