Spaces:
Running
Running
title: Voice Studio & Audio Translation | |
emoji: ๐ค | |
colorFrom: blue | |
colorTo: purple | |
sdk: gradio | |
sdk_version: 5.43.1 | |
app_file: app.py | |
pinned: false | |
# ๐ค Voice Studio & Audio Translation | |
A comprehensive AI-powered application that combines text-to-speech synthesis and audio translation capabilities in one unified interface. | |
## ๐ Features | |
### ๐ค Voice Studio | |
- **26 High-Quality Voices**: Standard neural voices across 13 countries | |
- **Multi-Language Support**: Vietnamese, English (US/UK), German, French, Spanish, Italian, Japanese, Korean, Chinese, Russian, Portuguese, Arabic | |
- **Speed Control**: Adjustable speech rate from 0.5x to 2.0x | |
- **Instant Download**: Generate and download MP3 files | |
- **Pure Neural Voices**: Only official Edge TTS neural voices, no artificial variations | |
### ๐๏ธ Audio Translation | |
- **Audio Transcription**: Powered by Google Gemini 2.0 Flash | |
- **Language Detection**: Automatic source language identification | |
- **Cultural Translation**: Context-aware translation preserving cultural nuances | |
- **Voice Synthesis**: Integrated with Voice Studio's 26 voices | |
- **Multiple Formats**: Download as TXT or Word documents | |
- **Side-by-Side Comparison**: Compare original and translated content | |
## ๐ Supported Languages | |
**Voice Studio (26 voices):** | |
- ๐ป๐ณ **Vietnamese**: HoaiMy (Female), NamMinh (Male) | |
- ๐บ๐ธ **American English**: Aria (Female), Guy (Male) | |
- ๐ฌ๐ง **British English**: Sonia (Female), Ryan (Male) | |
- ๐ฉ๐ช **German**: Katja (Female), Conrad (Male) | |
- ๐ซ๐ท **French**: Denise (Female), Henri (Male) | |
- ๐ช๐ธ **Spanish**: Elvira (Female), Alvaro (Male) | |
- ๐ฎ๐น **Italian**: Elsa (Female), Diego (Male) | |
- ๐ฏ๐ต **Japanese**: Nanami (Female), Keita (Male) | |
- ๐ฐ๐ท **Korean**: SunHi (Female), BongJin (Male) | |
- ๐จ๐ณ **Chinese**: Xiaoxiao (Female), Yunxi (Male) | |
- ๐ท๐บ **Russian**: Svetlana (Female), Dmitry (Male) | |
- ๐ต๐น **Portuguese**: Francisca (Female), Antonio (Male) | |
- ๐ธ๐ฆ **Arabic**: Zariyah (Female), Hamed (Male) | |
**Audio Translation:** | |
- All Voice Studio languages plus additional Google TTS supported languages | |
## ๐ง Technology Stack | |
- **Frontend**: Gradio 4.0+ with responsive mobile design | |
- **TTS Engine**: Microsoft Edge TTS Neural Voices | |
- **AI Translation**: Google Gemini 2.0 Flash | |
- **Audio Processing**: Google Text-to-Speech, advanced audio libraries | |
- **File Handling**: SoundFile, Librosa, python-docx | |
## โ๏ธ Setup | |
### Prerequisites | |
- Python 3.8+ | |
- Google Gemini API Key | |
### Environment Variables | |
```bash | |
export GEMINI_API_KEY="your_gemini_api_key_here" | |
``` | |
### Installation | |
```bash | |
pip install -r requirements.txt | |
``` | |
### Run the Application | |
```bash | |
python app.py | |
``` | |
The application will be available at `http://localhost:7860` | |
## ๐ฑ Mobile Optimized | |
The interface is fully responsive and optimized for mobile devices with: | |
- Touch-friendly buttons | |
- Vertical stacking on small screens | |
- Optimized font sizes and spacing | |
- Mobile-first design approach | |
## ๐ Privacy & Security | |
- **No Data Storage**: All processing is done in memory | |
- **Temporary Files**: Audio and text files are automatically cleaned up | |
- **Secure API**: Uses environment variables for API keys | |
- **Local Processing**: Text-to-speech runs locally using Edge TTS | |
## ๐ฏ Use Cases | |
- **Language Learning**: Practice pronunciation in multiple languages | |
- **Content Creation**: Generate multilingual audio content | |
- **Accessibility**: Convert text to speech for visually impaired users | |
- **Translation Services**: Translate audio content while preserving voice characteristics | |
- **Podcast Localization**: Create multilingual versions of audio content | |
## ๐ ๏ธ Advanced Features | |
- **Automatic Language Detection**: Intelligently detects source language | |
- **Cultural Context Preservation**: Maintains meaning across cultural boundaries | |
- **High-Quality Audio**: WAV format output for best quality | |
- **Batch Processing Ready**: Designed for scalability | |
- **Error Handling**: Comprehensive error management and user feedback | |
## ๐ฆ Deployment | |
### Hugging Face Spaces | |
This application is ready for deployment on Hugging Face Spaces: | |
1. Upload all files to your Hugging Face Space | |
2. Set `GEMINI_API_KEY` in Space secrets | |
3. The app will automatically start on port 7860 | |
### Docker Support | |
```dockerfile | |
FROM python:3.9-slim | |
WORKDIR /app | |
COPY requirements.txt . | |
RUN pip install -r requirements.txt | |
COPY app.py . | |
EXPOSE 7860 | |
CMD ["python", "app.py"] | |
``` | |
## ๐ค Contributing | |
Contributions are welcome! Please feel free to submit a Pull Request. | |
## ๐ License | |
This project is licensed under the MIT License. | |
## ๐ Acknowledgments | |
- Microsoft Edge TTS for high-quality neural voices | |
- Google Gemini for advanced AI capabilities | |
- Librosa for advanced audio processing | |
- Gradio team for the excellent UI framework | |
--- | |
**Developed by Digitized Brains** ๐ง |