Voice_Agent / README.md
ducnguyen1978's picture
Upload 3 files
9b237e2 verified
---
title: Voice Studio & Audio Translation
emoji: ๐ŸŽค
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.43.1
app_file: app.py
pinned: false
---
# ๐ŸŽค Voice Studio & Audio Translation
A comprehensive AI-powered application that combines text-to-speech synthesis and audio translation capabilities in one unified interface.
## ๐ŸŒŸ Features
### ๐ŸŽค Voice Studio
- **26 High-Quality Voices**: Standard neural voices across 13 countries
- **Multi-Language Support**: Vietnamese, English (US/UK), German, French, Spanish, Italian, Japanese, Korean, Chinese, Russian, Portuguese, Arabic
- **Speed Control**: Adjustable speech rate from 0.5x to 2.0x
- **Instant Download**: Generate and download MP3 files
- **Pure Neural Voices**: Only official Edge TTS neural voices, no artificial variations
### ๐ŸŽ™๏ธ Audio Translation
- **Audio Transcription**: Powered by Google Gemini 2.0 Flash
- **Language Detection**: Automatic source language identification
- **Cultural Translation**: Context-aware translation preserving cultural nuances
- **Voice Synthesis**: Integrated with Voice Studio's 26 voices
- **Multiple Formats**: Download as TXT or Word documents
- **Side-by-Side Comparison**: Compare original and translated content
## ๐Ÿš€ Supported Languages
**Voice Studio (26 voices):**
- ๐Ÿ‡ป๐Ÿ‡ณ **Vietnamese**: HoaiMy (Female), NamMinh (Male)
- ๐Ÿ‡บ๐Ÿ‡ธ **American English**: Aria (Female), Guy (Male)
- ๐Ÿ‡ฌ๐Ÿ‡ง **British English**: Sonia (Female), Ryan (Male)
- ๐Ÿ‡ฉ๐Ÿ‡ช **German**: Katja (Female), Conrad (Male)
- ๐Ÿ‡ซ๐Ÿ‡ท **French**: Denise (Female), Henri (Male)
- ๐Ÿ‡ช๐Ÿ‡ธ **Spanish**: Elvira (Female), Alvaro (Male)
- ๐Ÿ‡ฎ๐Ÿ‡น **Italian**: Elsa (Female), Diego (Male)
- ๐Ÿ‡ฏ๐Ÿ‡ต **Japanese**: Nanami (Female), Keita (Male)
- ๐Ÿ‡ฐ๐Ÿ‡ท **Korean**: SunHi (Female), BongJin (Male)
- ๐Ÿ‡จ๐Ÿ‡ณ **Chinese**: Xiaoxiao (Female), Yunxi (Male)
- ๐Ÿ‡ท๐Ÿ‡บ **Russian**: Svetlana (Female), Dmitry (Male)
- ๐Ÿ‡ต๐Ÿ‡น **Portuguese**: Francisca (Female), Antonio (Male)
- ๐Ÿ‡ธ๐Ÿ‡ฆ **Arabic**: Zariyah (Female), Hamed (Male)
**Audio Translation:**
- All Voice Studio languages plus additional Google TTS supported languages
## ๐Ÿ”ง Technology Stack
- **Frontend**: Gradio 4.0+ with responsive mobile design
- **TTS Engine**: Microsoft Edge TTS Neural Voices
- **AI Translation**: Google Gemini 2.0 Flash
- **Audio Processing**: Google Text-to-Speech, advanced audio libraries
- **File Handling**: SoundFile, Librosa, python-docx
## โš™๏ธ Setup
### Prerequisites
- Python 3.8+
- Google Gemini API Key
### Environment Variables
```bash
export GEMINI_API_KEY="your_gemini_api_key_here"
```
### Installation
```bash
pip install -r requirements.txt
```
### Run the Application
```bash
python app.py
```
The application will be available at `http://localhost:7860`
## ๐Ÿ“ฑ Mobile Optimized
The interface is fully responsive and optimized for mobile devices with:
- Touch-friendly buttons
- Vertical stacking on small screens
- Optimized font sizes and spacing
- Mobile-first design approach
## ๐Ÿ”’ Privacy & Security
- **No Data Storage**: All processing is done in memory
- **Temporary Files**: Audio and text files are automatically cleaned up
- **Secure API**: Uses environment variables for API keys
- **Local Processing**: Text-to-speech runs locally using Edge TTS
## ๐ŸŽฏ Use Cases
- **Language Learning**: Practice pronunciation in multiple languages
- **Content Creation**: Generate multilingual audio content
- **Accessibility**: Convert text to speech for visually impaired users
- **Translation Services**: Translate audio content while preserving voice characteristics
- **Podcast Localization**: Create multilingual versions of audio content
## ๐Ÿ› ๏ธ Advanced Features
- **Automatic Language Detection**: Intelligently detects source language
- **Cultural Context Preservation**: Maintains meaning across cultural boundaries
- **High-Quality Audio**: WAV format output for best quality
- **Batch Processing Ready**: Designed for scalability
- **Error Handling**: Comprehensive error management and user feedback
## ๐Ÿ“ฆ Deployment
### Hugging Face Spaces
This application is ready for deployment on Hugging Face Spaces:
1. Upload all files to your Hugging Face Space
2. Set `GEMINI_API_KEY` in Space secrets
3. The app will automatically start on port 7860
### Docker Support
```dockerfile
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY app.py .
EXPOSE 7860
CMD ["python", "app.py"]
```
## ๐Ÿค Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## ๐Ÿ“„ License
This project is licensed under the MIT License.
## ๐Ÿ™ Acknowledgments
- Microsoft Edge TTS for high-quality neural voices
- Google Gemini for advanced AI capabilities
- Librosa for advanced audio processing
- Gradio team for the excellent UI framework
---
**Developed by Digitized Brains** ๐Ÿง