Spaces:

ducnguyen1978
/

Voice_Agent

Running

App Files Files Community

Voice_Agent / README.md

ducnguyen1978's picture

Upload 3 files

9b237e2 verified 15 days ago

|

history blame contribute delete

4.89 kB

A newer version of the Gradio SDK is available: 5.44.1

Upgrade

metadata

title: Voice Studio & Audio Translation
emoji: 🎤
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.43.1
app_file: app.py
pinned: false

🎤 Voice Studio & Audio Translation

A comprehensive AI-powered application that combines text-to-speech synthesis and audio translation capabilities in one unified interface.

🌟 Features

🎤 Voice Studio

26 High-Quality Voices: Standard neural voices across 13 countries
Multi-Language Support: Vietnamese, English (US/UK), German, French, Spanish, Italian, Japanese, Korean, Chinese, Russian, Portuguese, Arabic
Speed Control: Adjustable speech rate from 0.5x to 2.0x
Instant Download: Generate and download MP3 files
Pure Neural Voices: Only official Edge TTS neural voices, no artificial variations

🎙️ Audio Translation

Audio Transcription: Powered by Google Gemini 2.0 Flash
Language Detection: Automatic source language identification
Cultural Translation: Context-aware translation preserving cultural nuances
Voice Synthesis: Integrated with Voice Studio's 26 voices
Multiple Formats: Download as TXT or Word documents
Side-by-Side Comparison: Compare original and translated content

🚀 Supported Languages

Voice Studio (26 voices):

🇻🇳 Vietnamese: HoaiMy (Female), NamMinh (Male)
🇺🇸 American English: Aria (Female), Guy (Male)
🇬🇧 British English: Sonia (Female), Ryan (Male)
🇩🇪 German: Katja (Female), Conrad (Male)
🇫🇷 French: Denise (Female), Henri (Male)
🇪🇸 Spanish: Elvira (Female), Alvaro (Male)
🇮🇹 Italian: Elsa (Female), Diego (Male)
🇯🇵 Japanese: Nanami (Female), Keita (Male)
🇰🇷 Korean: SunHi (Female), BongJin (Male)
🇨🇳 Chinese: Xiaoxiao (Female), Yunxi (Male)
🇷🇺 Russian: Svetlana (Female), Dmitry (Male)
🇵🇹 Portuguese: Francisca (Female), Antonio (Male)
🇸🇦 Arabic: Zariyah (Female), Hamed (Male)

Audio Translation:

All Voice Studio languages plus additional Google TTS supported languages

🔧 Technology Stack

Frontend: Gradio 4.0+ with responsive mobile design
TTS Engine: Microsoft Edge TTS Neural Voices
AI Translation: Google Gemini 2.0 Flash
Audio Processing: Google Text-to-Speech, advanced audio libraries
File Handling: SoundFile, Librosa, python-docx

⚙️ Setup

Prerequisites

Python 3.8+
Google Gemini API Key

Environment Variables

export GEMINI_API_KEY="your_gemini_api_key_here"

Installation

pip install -r requirements.txt

Run the Application

python app.py

The application will be available at http://localhost:7860

📱 Mobile Optimized

The interface is fully responsive and optimized for mobile devices with:

Touch-friendly buttons
Vertical stacking on small screens
Optimized font sizes and spacing
Mobile-first design approach

🔒 Privacy & Security

No Data Storage: All processing is done in memory
Temporary Files: Audio and text files are automatically cleaned up
Secure API: Uses environment variables for API keys
Local Processing: Text-to-speech runs locally using Edge TTS

🎯 Use Cases

Language Learning: Practice pronunciation in multiple languages
Content Creation: Generate multilingual audio content
Accessibility: Convert text to speech for visually impaired users
Translation Services: Translate audio content while preserving voice characteristics
Podcast Localization: Create multilingual versions of audio content

🛠️ Advanced Features

Automatic Language Detection: Intelligently detects source language
Cultural Context Preservation: Maintains meaning across cultural boundaries
High-Quality Audio: WAV format output for best quality
Batch Processing Ready: Designed for scalability
Error Handling: Comprehensive error management and user feedback

📦 Deployment

Hugging Face Spaces

This application is ready for deployment on Hugging Face Spaces:

Upload all files to your Hugging Face Space
Set GEMINI_API_KEY in Space secrets
The app will automatically start on port 7860

Docker Support

FROM python:3.9-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

COPY app.py .
EXPOSE 7860

CMD ["python", "app.py"]

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📄 License

This project is licensed under the MIT License.

🙏 Acknowledgments

Microsoft Edge TTS for high-quality neural voices
Google Gemini for advanced AI capabilities
Librosa for advanced audio processing
Gradio team for the excellent UI framework

Developed by Digitized Brains 🧠