Voice_Agent / README.md
ducnguyen1978's picture
Upload 3 files
9b237e2 verified

A newer version of the Gradio SDK is available: 5.44.1

Upgrade
metadata
title: Voice Studio & Audio Translation
emoji: ๐ŸŽค
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.43.1
app_file: app.py
pinned: false

๐ŸŽค Voice Studio & Audio Translation

A comprehensive AI-powered application that combines text-to-speech synthesis and audio translation capabilities in one unified interface.

๐ŸŒŸ Features

๐ŸŽค Voice Studio

  • 26 High-Quality Voices: Standard neural voices across 13 countries
  • Multi-Language Support: Vietnamese, English (US/UK), German, French, Spanish, Italian, Japanese, Korean, Chinese, Russian, Portuguese, Arabic
  • Speed Control: Adjustable speech rate from 0.5x to 2.0x
  • Instant Download: Generate and download MP3 files
  • Pure Neural Voices: Only official Edge TTS neural voices, no artificial variations

๐ŸŽ™๏ธ Audio Translation

  • Audio Transcription: Powered by Google Gemini 2.0 Flash
  • Language Detection: Automatic source language identification
  • Cultural Translation: Context-aware translation preserving cultural nuances
  • Voice Synthesis: Integrated with Voice Studio's 26 voices
  • Multiple Formats: Download as TXT or Word documents
  • Side-by-Side Comparison: Compare original and translated content

๐Ÿš€ Supported Languages

Voice Studio (26 voices):

  • ๐Ÿ‡ป๐Ÿ‡ณ Vietnamese: HoaiMy (Female), NamMinh (Male)
  • ๐Ÿ‡บ๐Ÿ‡ธ American English: Aria (Female), Guy (Male)
  • ๐Ÿ‡ฌ๐Ÿ‡ง British English: Sonia (Female), Ryan (Male)
  • ๐Ÿ‡ฉ๐Ÿ‡ช German: Katja (Female), Conrad (Male)
  • ๐Ÿ‡ซ๐Ÿ‡ท French: Denise (Female), Henri (Male)
  • ๐Ÿ‡ช๐Ÿ‡ธ Spanish: Elvira (Female), Alvaro (Male)
  • ๐Ÿ‡ฎ๐Ÿ‡น Italian: Elsa (Female), Diego (Male)
  • ๐Ÿ‡ฏ๐Ÿ‡ต Japanese: Nanami (Female), Keita (Male)
  • ๐Ÿ‡ฐ๐Ÿ‡ท Korean: SunHi (Female), BongJin (Male)
  • ๐Ÿ‡จ๐Ÿ‡ณ Chinese: Xiaoxiao (Female), Yunxi (Male)
  • ๐Ÿ‡ท๐Ÿ‡บ Russian: Svetlana (Female), Dmitry (Male)
  • ๐Ÿ‡ต๐Ÿ‡น Portuguese: Francisca (Female), Antonio (Male)
  • ๐Ÿ‡ธ๐Ÿ‡ฆ Arabic: Zariyah (Female), Hamed (Male)

Audio Translation:

  • All Voice Studio languages plus additional Google TTS supported languages

๐Ÿ”ง Technology Stack

  • Frontend: Gradio 4.0+ with responsive mobile design
  • TTS Engine: Microsoft Edge TTS Neural Voices
  • AI Translation: Google Gemini 2.0 Flash
  • Audio Processing: Google Text-to-Speech, advanced audio libraries
  • File Handling: SoundFile, Librosa, python-docx

โš™๏ธ Setup

Prerequisites

  • Python 3.8+
  • Google Gemini API Key

Environment Variables

export GEMINI_API_KEY="your_gemini_api_key_here"

Installation

pip install -r requirements.txt

Run the Application

python app.py

The application will be available at http://localhost:7860

๐Ÿ“ฑ Mobile Optimized

The interface is fully responsive and optimized for mobile devices with:

  • Touch-friendly buttons
  • Vertical stacking on small screens
  • Optimized font sizes and spacing
  • Mobile-first design approach

๐Ÿ”’ Privacy & Security

  • No Data Storage: All processing is done in memory
  • Temporary Files: Audio and text files are automatically cleaned up
  • Secure API: Uses environment variables for API keys
  • Local Processing: Text-to-speech runs locally using Edge TTS

๐ŸŽฏ Use Cases

  • Language Learning: Practice pronunciation in multiple languages
  • Content Creation: Generate multilingual audio content
  • Accessibility: Convert text to speech for visually impaired users
  • Translation Services: Translate audio content while preserving voice characteristics
  • Podcast Localization: Create multilingual versions of audio content

๐Ÿ› ๏ธ Advanced Features

  • Automatic Language Detection: Intelligently detects source language
  • Cultural Context Preservation: Maintains meaning across cultural boundaries
  • High-Quality Audio: WAV format output for best quality
  • Batch Processing Ready: Designed for scalability
  • Error Handling: Comprehensive error management and user feedback

๐Ÿ“ฆ Deployment

Hugging Face Spaces

This application is ready for deployment on Hugging Face Spaces:

  1. Upload all files to your Hugging Face Space
  2. Set GEMINI_API_KEY in Space secrets
  3. The app will automatically start on port 7860

Docker Support

FROM python:3.9-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

COPY app.py .
EXPOSE 7860

CMD ["python", "app.py"]

๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

๐Ÿ“„ License

This project is licensed under the MIT License.

๐Ÿ™ Acknowledgments

  • Microsoft Edge TTS for high-quality neural voices
  • Google Gemini for advanced AI capabilities
  • Librosa for advanced audio processing
  • Gradio team for the excellent UI framework

Developed by Digitized Brains ๐Ÿง