Spaces:
Running
Running
title: Podcasity | |
emoji: π | |
colorFrom: pink | |
colorTo: purple | |
sdk: gradio | |
sdk_version: 5.33.1 | |
app_file: app.py | |
pinned: false | |
license: mit | |
short_description: Generate engaging podcast conversations from documents, link | |
tags: | |
- Agents-MCP-Hackathon | |
- mcp-server-track | |
# ποΈ Podcast Generator | |
This project is a Gradio-based web application that generates a podcast-style conversation from a document, a web link, or raw text. It leverages the power of Mistral AI to create a conversational script and generates the corresponding audio. | |
## π¬ Demo | |
πΊ **View Demo on YouTube:** | |
β‘οΈ [https://youtu.be/0UG4-itpqZU](https://youtu.be/0UG4-itpqZU) | |
--- | |
## π Sample Audio | |
π§ **Listen to a sample podcast audio:** | |
β‘οΈ [demo_sample.wav](./demo_sample.wav) | |
## β¨ Powered by | |
This project is made possible by the following amazing technologies: | |
- **[Gradio](https://www.gradio.app/):** For creating the simple and intuitive web interface for the application. | |
- **[Modal](https://modal.com/):** For serverless hosting of the core audio generation API, allowing for scalable and on-demand processing. | |
- **[Mistral AI](https://mistral.ai/):** For using its powerful language models to generate the podcast script from the input text. | |
- **[Kokoro](https://huggingface.co/hexgrad/Kokoro-82M):** For high-quality text-to-speech synthesis. | |
## Architecture | |
This project has a client-server architecture: | |
1. **Gradio Frontend (`app.py`):** The main application you run. It provides a user interface to input text, a document, or a link. It then calls the Mistral AI API to generate a podcast script and orchestrates the calls to the audio generation backend. | |
2. **Modal Backend (`modal/app.py`):** A serverless backend deployed on Modal. | |
- It exposes a FastAPI endpoint that takes text and a voice preference. | |
- It uses the `kokoro` library to perform the text-to-speech conversion. | |
- This backend is what actually generates the audio files, which are then sent back to the Gradio client. | |
- It is configured to use a T4 GPU for faster inference. | |
## π Features | |
- **Multiple Input Sources:** Provide a URL to a document (like a PDF), a link to a webpage, or just paste in raw text. | |
- **AI-Powered Scripting:** Uses Mistral AI to transform your input text into a natural-sounding conversation between two hosts. | |
- **Audio Generation:** Creates a downloadable audio file (`.wav`) of the generated podcast conversation. | |
- **Simple Web Interface:** An easy-to-use interface built with Gradio. | |
## πββοΈ How to Run | |
1. **Clone the repository:** | |
```bash | |
git clone https://huggingface.co/spaces/Agents-MCP-Hackathon/podcastify | |
cd podcastify | |
``` | |
2. **Install dependencies:** | |
```bash | |
pip install -r requirements.txt | |
``` | |
3. **Set up your API Key:** | |
This project requires an API key from Mistral AI. You need to set it as an environment variable. | |
```bash | |
export MISTRAL_API_KEY='your-mistral-api-key' | |
``` | |
On Windows, you can use: | |
```powershell | |
$env:MISTRAL_API_KEY='your-mistral-api-key' | |
``` | |
4. **Run the application:** | |
```bash | |
python app.py | |
``` | |
This will start a local web server, and you can access the application in your browser at the URL provided in the terminal (usually `http://127.0.0.1:7860`). | |
## π Project Structure | |
- `app.py`: The main file containing the Gradio application. It handles the user interface, text processing with Mistral AI, and calls the audio generation API. | |
- `modal/app.py`: The serverless backend function deployed on Modal, responsible for the core text-to-speech generation using `kokoro`. | |
- `requirements.txt`: Lists all the Python dependencies for the project. | |