zach
Update prerequisites in README
e55b00b
|
raw
history blame
4.36 kB
<div align="center">
<img src="https://storage.googleapis.com/hume-public-logos/hume/hume-banner.png">
<h1>Hume AI | Expressive TTS Arena</h1>
<p>
<strong> An interactive platform for comparing and evaluating the expressiveness of different text-to-speech models </strong>
</p>
</div>
## Overview
Expressive TTS Arena is an open-source web application that enables users to compare text-to-speech outputs with a focus on expressiveness rather than just audio quality. Built with [Gradio](https://www.gradio.app/), it provides a seamless interface for generating and comparing speech synthesis from different providers, including Hume AI and ElevenLabs.
## Prerequisites
- Python >=3.11.11
- pip >=25.0
- uv >=0.5.29
- API keys for **Hume AI**, **Anthropic**, and **ElevenLabs**
## Project Structure
```
Expressive TTS Arena/
β”œβ”€β”€ src/
β”‚ β”œβ”€β”€ assets/
β”‚ β”‚ β”œβ”€β”€ styles.css # Defines custom css
β”‚ β”œβ”€β”€ database/
β”‚ β”‚ β”œβ”€β”€ __init__.py # Makes database a package; expose ORM methods
β”‚ β”‚ β”œβ”€β”€ crud.py # Defines operations for interacting with database
β”‚ β”‚ β”œβ”€β”€ database.py # Sets up SQLAlchemy database connection
β”‚ β”‚ └── models.py # SQLAlchemy database models
β”‚ β”œβ”€β”€ integrations/
β”‚ β”‚ β”œβ”€β”€ __init__.py # Makes integrations a package; exposes API clients
β”‚ β”‚ β”œβ”€β”€ anthropic_api.py # Anthropic API integration
β”‚ β”‚ β”œβ”€β”€ elevenlabs_api.py # ElevenLabs API integration
β”‚ β”‚ └── hume_api.py # Hume API integration
β”‚ β”œβ”€β”€ scripts/
β”‚ β”‚ β”œβ”€β”€ __init__.py # Makes scripts a package
β”‚ β”‚ β”œβ”€β”€ init_db.py # Script for initializing database
β”‚ β”‚ β”œβ”€β”€ test_db.py # Script for testing database connection
β”‚ β”œβ”€β”€ __init__.py # Makes src a package
β”‚ β”œβ”€β”€ app.py # Entry file
β”‚ β”œβ”€β”€ config.py # Global config and logger setup
β”‚ β”œβ”€β”€ constants.py # Global constants
β”‚ β”œβ”€β”€ custom_types.py # Global custom types
β”‚ β”œβ”€β”€ theme.py # Custom Gradio Theme
β”‚ └── utils.py # Utility functions
│── static/
β”‚ β”œβ”€β”€ audio/ # Directory for storing generated audio files
β”œβ”€β”€ .env.example
β”œβ”€β”€ .gitignore
β”œβ”€β”€ .pre-commit-config.yaml
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ LICENSE.txt
β”œβ”€β”€ pyproject.toml
β”œβ”€β”€ README.md
β”œβ”€β”€ uv.lock
```
## Installation
1. This project uses the [uv](https://docs.astral.sh/uv/) package manager. Follow the installation instructions for your platform [here](https://docs.astral.sh/uv/getting-started/installation/).
2. Configure environment variables:
- Create a `.env` file based on `.env.example`
- Add your API keys:
```txt
HUME_API_KEY=YOUR_HUME_API_KEY
ANTHROPIC_API_KEY=YOUR_ANTHROPIC_API_KEY
ELEVENLABS_API_KEY=YOUR_ELEVENLABS_API_KEY
```
3. Run the application:
Standard
```sh
uv run python -m src.app
```
With hot-reloading
```sh
uv run watchfiles "python -m src.app" src
```
4. Test the application by navigating to the the localhost URL in your browser (e.g. `localhost:7860` or `http://127.0.0.1:7860`)
5. (Optional) If contributing, install pre-commit hook for automatic file formatting:
```sh
uv run pre-commit install
```
## User Flow
1. **Choose or enter a character description**: Select a sample from the list or enter your own to guide text and voice generation.
2. **Generate text**: Click **"Generate Text"** to create dialogue based on the character. The generated text will appear in the input field automaticallyβ€”edit it if needed.
3. **Synthesize speech**: Click **"Synthesize Speech"** to send your text and character description to two TTS APIs. Each API generates a voice and synthesizes speech in that voice.
4. **Listen & compare**: Play both audio options and assess their expressiveness.
5. **Vote for the best**: Click **"Select Option A"** or **"Select Option B"** to choose the most expressive output.
## License
This project is licensed under the MIT License - see the [LICENSE.txt](LICENSE.txt) file for details.