Spaces:

Agents-MCP-Hackathon
/

podcastify

Running

App Files Files Community

podcastify / README.md

eswardivi

Update README.md

5098582 verified 3 months ago

preview code

raw

history blame contribute delete

3.74 kB

	---
	title: Podcasity
	emoji: 🌍
	colorFrom: pink
	colorTo: purple
	sdk: gradio
	sdk_version: 5.33.1
	app_file: app.py
	pinned: false
	license: mit
	short_description: Generate engaging podcast conversations from documents, link
	tags:
	- Agents-MCP-Hackathon
	- mcp-server-track
	---



	# 🎙️ Podcast Generator

	This project is a Gradio-based web application that generates a podcast-style conversation from a document, a web link, or raw text. It leverages the power of Mistral AI to create a conversational script and generates the corresponding audio.



	## 🎬 Demo

	📺 View Demo on YouTube:
	➡️ [https://youtu.be/0UG4-itpqZU](https://youtu.be/0UG4-itpqZU)
	---

	## 🔊 Sample Audio

	🎧 Listen to a sample podcast audio:
	➡️ [demo_sample.wav](./demo_sample.wav)

	## ✨ Powered by

	This project is made possible by the following amazing technologies:

	- [Gradio](https://www.gradio.app/): For creating the simple and intuitive web interface for the application.
	- [Modal](https://modal.com/): For serverless hosting of the core audio generation API, allowing for scalable and on-demand processing.
	- [Mistral AI](https://mistral.ai/): For using its powerful language models to generate the podcast script from the input text.
	- [Kokoro](https://huggingface.co/hexgrad/Kokoro-82M): For high-quality text-to-speech synthesis.

	## Architecture

	This project has a client-server architecture:

	1. Gradio Frontend (`app.py`): The main application you run. It provides a user interface to input text, a document, or a link. It then calls the Mistral AI API to generate a podcast script and orchestrates the calls to the audio generation backend.

	2. Modal Backend (`modal/app.py`): A serverless backend deployed on Modal.
	- It exposes a FastAPI endpoint that takes text and a voice preference.
	- It uses the `kokoro` library to perform the text-to-speech conversion.
	- This backend is what actually generates the audio files, which are then sent back to the Gradio client.
	- It is configured to use a T4 GPU for faster inference.

	## 🚀 Features

	- Multiple Input Sources: Provide a URL to a document (like a PDF), a link to a webpage, or just paste in raw text.
	- AI-Powered Scripting: Uses Mistral AI to transform your input text into a natural-sounding conversation between two hosts.
	- Audio Generation: Creates a downloadable audio file (`.wav`) of the generated podcast conversation.
	- Simple Web Interface: An easy-to-use interface built with Gradio.

	## 🏃‍♀️ How to Run

	1. Clone the repository:
	```bash
	git clone https://huggingface.co/spaces/Agents-MCP-Hackathon/podcastify
	cd podcastify
	```

	2. Install dependencies:
	```bash
	pip install -r requirements.txt
	```

	3. Set up your API Key:
	This project requires an API key from Mistral AI. You need to set it as an environment variable.
	```bash
	export MISTRAL_API_KEY='your-mistral-api-key'
	```
	On Windows, you can use:
	```powershell
	$env:MISTRAL_API_KEY='your-mistral-api-key'
	```

	4. Run the application:
	```bash
	python app.py
	```
	This will start a local web server, and you can access the application in your browser at the URL provided in the terminal (usually `http://127.0.0.1:7860`).

	## 📁 Project Structure

	- `app.py`: The main file containing the Gradio application. It handles the user interface, text processing with Mistral AI, and calls the audio generation API.
	- `modal/app.py`: The serverless backend function deployed on Modal, responsible for the core text-to-speech generation using `kokoro`.
	- `requirements.txt`: Lists all the Python dependencies for the project.