File size: 3,744 Bytes
775fe00
 
 
 
 
 
 
 
 
 
 
5098582
 
 
775fe00
 
bf914a3
 
a7330d3
 
 
 
74fd773
 
 
 
 
 
 
 
 
 
 
 
 
a7330d3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
---
title: Podcasity
emoji: 🌍
colorFrom: pink
colorTo: purple
sdk: gradio
sdk_version: 5.33.1
app_file: app.py
pinned: false
license: mit
short_description: Generate engaging podcast conversations from documents, link
tags:
  - Agents-MCP-Hackathon
  - mcp-server-track
---



# πŸŽ™οΈ Podcast Generator

This project is a Gradio-based web application that generates a podcast-style conversation from a document, a web link, or raw text. It leverages the power of Mistral AI to create a conversational script and generates the corresponding audio.



## 🎬 Demo

πŸ“Ί **View Demo on YouTube:**  
➑️ [https://youtu.be/0UG4-itpqZU](https://youtu.be/0UG4-itpqZU)
---

## πŸ”Š Sample Audio

🎧 **Listen to a sample podcast audio:**  
➑️ [demo_sample.wav](./demo_sample.wav)

## ✨ Powered by

This project is made possible by the following amazing technologies:

- **[Gradio](https://www.gradio.app/):** For creating the simple and intuitive web interface for the application.
- **[Modal](https://modal.com/):** For serverless hosting of the core audio generation API, allowing for scalable and on-demand processing.
- **[Mistral AI](https://mistral.ai/):** For using its powerful language models to generate the podcast script from the input text.
- **[Kokoro](https://huggingface.co/hexgrad/Kokoro-82M):** For high-quality text-to-speech synthesis.

## Architecture

This project has a client-server architecture:

1.  **Gradio Frontend (`app.py`):** The main application you run. It provides a user interface to input text, a document, or a link. It then calls the Mistral AI API to generate a podcast script and orchestrates the calls to the audio generation backend.

2.  **Modal Backend (`modal/app.py`):** A serverless backend deployed on Modal.
    -   It exposes a FastAPI endpoint that takes text and a voice preference.
    -   It uses the `kokoro` library to perform the text-to-speech conversion.
    -   This backend is what actually generates the audio files, which are then sent back to the Gradio client.
    -   It is configured to use a T4 GPU for faster inference.

## πŸš€ Features

- **Multiple Input Sources:** Provide a URL to a document (like a PDF), a link to a webpage, or just paste in raw text.
- **AI-Powered Scripting:** Uses Mistral AI to transform your input text into a natural-sounding conversation between two hosts.
- **Audio Generation:** Creates a downloadable audio file (`.wav`) of the generated podcast conversation.
- **Simple Web Interface:** An easy-to-use interface built with Gradio.

## πŸƒβ€β™€οΈ How to Run

1.  **Clone the repository:**
    ```bash
    git clone https://huggingface.co/spaces/Agents-MCP-Hackathon/podcastify
    cd podcastify
    ```

2.  **Install dependencies:**
    ```bash
    pip install -r requirements.txt
    ```

3.  **Set up your API Key:**
    This project requires an API key from Mistral AI. You need to set it as an environment variable.
    ```bash
    export MISTRAL_API_KEY='your-mistral-api-key'
    ```
    On Windows, you can use:
    ```powershell
    $env:MISTRAL_API_KEY='your-mistral-api-key'
    ```

4.  **Run the application:**
    ```bash
    python app.py
    ```
    This will start a local web server, and you can access the application in your browser at the URL provided in the terminal (usually `http://127.0.0.1:7860`).

## πŸ“ Project Structure

-   `app.py`: The main file containing the Gradio application. It handles the user interface, text processing with Mistral AI, and calls the audio generation API.
-   `modal/app.py`: The serverless backend function deployed on Modal, responsible for the core text-to-speech generation using `kokoro`.
-   `requirements.txt`: Lists all the Python dependencies for the project.