Spaces:
Running
Running
A newer version of the Gradio SDK is available:
5.44.1
metadata
title: Voice Studio & Audio Translation
emoji: ๐ค
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.43.1
app_file: app.py
pinned: false
๐ค Voice Studio & Audio Translation
A comprehensive AI-powered application that combines text-to-speech synthesis and audio translation capabilities in one unified interface.
๐ Features
๐ค Voice Studio
- 26 High-Quality Voices: Standard neural voices across 13 countries
- Multi-Language Support: Vietnamese, English (US/UK), German, French, Spanish, Italian, Japanese, Korean, Chinese, Russian, Portuguese, Arabic
- Speed Control: Adjustable speech rate from 0.5x to 2.0x
- Instant Download: Generate and download MP3 files
- Pure Neural Voices: Only official Edge TTS neural voices, no artificial variations
๐๏ธ Audio Translation
- Audio Transcription: Powered by Google Gemini 2.0 Flash
- Language Detection: Automatic source language identification
- Cultural Translation: Context-aware translation preserving cultural nuances
- Voice Synthesis: Integrated with Voice Studio's 26 voices
- Multiple Formats: Download as TXT or Word documents
- Side-by-Side Comparison: Compare original and translated content
๐ Supported Languages
Voice Studio (26 voices):
- ๐ป๐ณ Vietnamese: HoaiMy (Female), NamMinh (Male)
- ๐บ๐ธ American English: Aria (Female), Guy (Male)
- ๐ฌ๐ง British English: Sonia (Female), Ryan (Male)
- ๐ฉ๐ช German: Katja (Female), Conrad (Male)
- ๐ซ๐ท French: Denise (Female), Henri (Male)
- ๐ช๐ธ Spanish: Elvira (Female), Alvaro (Male)
- ๐ฎ๐น Italian: Elsa (Female), Diego (Male)
- ๐ฏ๐ต Japanese: Nanami (Female), Keita (Male)
- ๐ฐ๐ท Korean: SunHi (Female), BongJin (Male)
- ๐จ๐ณ Chinese: Xiaoxiao (Female), Yunxi (Male)
- ๐ท๐บ Russian: Svetlana (Female), Dmitry (Male)
- ๐ต๐น Portuguese: Francisca (Female), Antonio (Male)
- ๐ธ๐ฆ Arabic: Zariyah (Female), Hamed (Male)
Audio Translation:
- All Voice Studio languages plus additional Google TTS supported languages
๐ง Technology Stack
- Frontend: Gradio 4.0+ with responsive mobile design
- TTS Engine: Microsoft Edge TTS Neural Voices
- AI Translation: Google Gemini 2.0 Flash
- Audio Processing: Google Text-to-Speech, advanced audio libraries
- File Handling: SoundFile, Librosa, python-docx
โ๏ธ Setup
Prerequisites
- Python 3.8+
- Google Gemini API Key
Environment Variables
export GEMINI_API_KEY="your_gemini_api_key_here"
Installation
pip install -r requirements.txt
Run the Application
python app.py
The application will be available at http://localhost:7860
๐ฑ Mobile Optimized
The interface is fully responsive and optimized for mobile devices with:
- Touch-friendly buttons
- Vertical stacking on small screens
- Optimized font sizes and spacing
- Mobile-first design approach
๐ Privacy & Security
- No Data Storage: All processing is done in memory
- Temporary Files: Audio and text files are automatically cleaned up
- Secure API: Uses environment variables for API keys
- Local Processing: Text-to-speech runs locally using Edge TTS
๐ฏ Use Cases
- Language Learning: Practice pronunciation in multiple languages
- Content Creation: Generate multilingual audio content
- Accessibility: Convert text to speech for visually impaired users
- Translation Services: Translate audio content while preserving voice characteristics
- Podcast Localization: Create multilingual versions of audio content
๐ ๏ธ Advanced Features
- Automatic Language Detection: Intelligently detects source language
- Cultural Context Preservation: Maintains meaning across cultural boundaries
- High-Quality Audio: WAV format output for best quality
- Batch Processing Ready: Designed for scalability
- Error Handling: Comprehensive error management and user feedback
๐ฆ Deployment
Hugging Face Spaces
This application is ready for deployment on Hugging Face Spaces:
- Upload all files to your Hugging Face Space
- Set
GEMINI_API_KEY
in Space secrets - The app will automatically start on port 7860
Docker Support
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY app.py .
EXPOSE 7860
CMD ["python", "app.py"]
๐ค Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
๐ License
This project is licensed under the MIT License.
๐ Acknowledgments
- Microsoft Edge TTS for high-quality neural voices
- Google Gemini for advanced AI capabilities
- Librosa for advanced audio processing
- Gradio team for the excellent UI framework
Developed by Digitized Brains ๐ง