zach
Add pyproject.toml
e98b5e2
|
raw
history blame
3.55 kB

Hume AI | Expressive TTS Arena

An interactive platform for comparing and evaluating the expressiveness of different text-to-speech engines

Overview

Expressive TTS Arena is an open-source web application that enables users to compare text-to-speech outputs with a focus on expressiveness rather than just audio quality. Built with Gradio, it provides a seamless interface for generating and comparing speech synthesis from different providers, including Hume AI and ElevenLabs.

Features

  • Text generation using Claude AI for creating expressive content.
  • Direct text input or AI-assisted text generation.
  • Comparative analysis of different TTS engines.
  • Simple voting mechanism for preferred outputs.
  • Random voice selection from multiple providers.
  • Real-time speech synthesis comparison.

Prerequisites

  • Python >=3.11.11
  • pip >=25.0
  • Virtual environment capability
  • API keys for Hume AI, Anthropic, and ElevenLabs
  • For a complete list of dependencies, see requirements.

Project Structure

Expressive TTS Arena/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ integrations/
β”‚   β”‚   β”œβ”€β”€ __init__.py         # Makes integrations a package; exposes API clients
β”‚   β”‚   β”œβ”€β”€ anthropic_api.py    # Anthropic API integration
β”‚   β”‚   β”œβ”€β”€ elevenlabs_api.py   # ElevenLabs API integration
β”‚   β”‚   └── hume_api.py         # Hume API integration
β”‚   β”œβ”€β”€ __init__.py             # Makes src a package; exposes key functionality
β”‚   β”œβ”€β”€ app.py                  # Entry file
β”‚   β”œβ”€β”€ config.py               # Global config and logger setup
β”‚   β”œβ”€β”€ constants.py            # Global constants
β”‚   β”œβ”€β”€ theme.py                # Custom Gradio Theme
β”‚   └── utils.py                # Utility functions
β”œβ”€β”€ .env.example
β”œβ”€β”€ .gitignore
β”œβ”€β”€ .pre-commit-config.yaml
└── requirements.txt

Installation

  1. Create and activate the virtual environment:

    Mac/Linux

    python -m venv gradio-env
    source gradio-env/bin/activate
    

    Windows

    python -m venv gradio-env
    gradio-env\Scripts\activate
    
  2. Install dependencies:

    pip install -r requirements.txt
    
  3. (Optional) If contributing, install pre-commit hook for automatic file formatting:

    pre-commit install
    
  4. Configure environment variables:

    • Create a .env file based on .env.example
    • Add your API keys:
    HUME_API_KEY=YOUR_HUME_API_KEY
    ANTHROPIC_API_KEY=YOUR_ANTHROPIC_API_KEY
    ELEVENLABS_API_KEY=YOUR_ELEVENLABS_API_KEY
    
  5. Run the application:

    watchfiles "python -m src.app"
    
  6. Test the application by navigating to the the localhost URL in your browser (e.g. localhost:7860 or http://127.0.0.1:7860)

User Flow

  1. Enter or Generate Text: Type directly in the Text box, or optionally enter a Prompt, click "Generate text", and edit if needed.
  2. Synthesize Speech: Click "Synthesize speech" to generate two audio outputs.
  3. Listen & Compare: Playback both options (A & B) to hear the differences.
  4. Vote for Your Favorite: Click "Vote for option A" or "Vote for option B" to choose your favorite.

License

This project is licensed under the MIT License - see the LICENSE.txt file for details.