Spaces:
Running
Running
File size: 4,890 Bytes
9b237e2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 |
---
title: Voice Studio & Audio Translation
emoji: ๐ค
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.43.1
app_file: app.py
pinned: false
---
# ๐ค Voice Studio & Audio Translation
A comprehensive AI-powered application that combines text-to-speech synthesis and audio translation capabilities in one unified interface.
## ๐ Features
### ๐ค Voice Studio
- **26 High-Quality Voices**: Standard neural voices across 13 countries
- **Multi-Language Support**: Vietnamese, English (US/UK), German, French, Spanish, Italian, Japanese, Korean, Chinese, Russian, Portuguese, Arabic
- **Speed Control**: Adjustable speech rate from 0.5x to 2.0x
- **Instant Download**: Generate and download MP3 files
- **Pure Neural Voices**: Only official Edge TTS neural voices, no artificial variations
### ๐๏ธ Audio Translation
- **Audio Transcription**: Powered by Google Gemini 2.0 Flash
- **Language Detection**: Automatic source language identification
- **Cultural Translation**: Context-aware translation preserving cultural nuances
- **Voice Synthesis**: Integrated with Voice Studio's 26 voices
- **Multiple Formats**: Download as TXT or Word documents
- **Side-by-Side Comparison**: Compare original and translated content
## ๐ Supported Languages
**Voice Studio (26 voices):**
- ๐ป๐ณ **Vietnamese**: HoaiMy (Female), NamMinh (Male)
- ๐บ๐ธ **American English**: Aria (Female), Guy (Male)
- ๐ฌ๐ง **British English**: Sonia (Female), Ryan (Male)
- ๐ฉ๐ช **German**: Katja (Female), Conrad (Male)
- ๐ซ๐ท **French**: Denise (Female), Henri (Male)
- ๐ช๐ธ **Spanish**: Elvira (Female), Alvaro (Male)
- ๐ฎ๐น **Italian**: Elsa (Female), Diego (Male)
- ๐ฏ๐ต **Japanese**: Nanami (Female), Keita (Male)
- ๐ฐ๐ท **Korean**: SunHi (Female), BongJin (Male)
- ๐จ๐ณ **Chinese**: Xiaoxiao (Female), Yunxi (Male)
- ๐ท๐บ **Russian**: Svetlana (Female), Dmitry (Male)
- ๐ต๐น **Portuguese**: Francisca (Female), Antonio (Male)
- ๐ธ๐ฆ **Arabic**: Zariyah (Female), Hamed (Male)
**Audio Translation:**
- All Voice Studio languages plus additional Google TTS supported languages
## ๐ง Technology Stack
- **Frontend**: Gradio 4.0+ with responsive mobile design
- **TTS Engine**: Microsoft Edge TTS Neural Voices
- **AI Translation**: Google Gemini 2.0 Flash
- **Audio Processing**: Google Text-to-Speech, advanced audio libraries
- **File Handling**: SoundFile, Librosa, python-docx
## โ๏ธ Setup
### Prerequisites
- Python 3.8+
- Google Gemini API Key
### Environment Variables
```bash
export GEMINI_API_KEY="your_gemini_api_key_here"
```
### Installation
```bash
pip install -r requirements.txt
```
### Run the Application
```bash
python app.py
```
The application will be available at `http://localhost:7860`
## ๐ฑ Mobile Optimized
The interface is fully responsive and optimized for mobile devices with:
- Touch-friendly buttons
- Vertical stacking on small screens
- Optimized font sizes and spacing
- Mobile-first design approach
## ๐ Privacy & Security
- **No Data Storage**: All processing is done in memory
- **Temporary Files**: Audio and text files are automatically cleaned up
- **Secure API**: Uses environment variables for API keys
- **Local Processing**: Text-to-speech runs locally using Edge TTS
## ๐ฏ Use Cases
- **Language Learning**: Practice pronunciation in multiple languages
- **Content Creation**: Generate multilingual audio content
- **Accessibility**: Convert text to speech for visually impaired users
- **Translation Services**: Translate audio content while preserving voice characteristics
- **Podcast Localization**: Create multilingual versions of audio content
## ๐ ๏ธ Advanced Features
- **Automatic Language Detection**: Intelligently detects source language
- **Cultural Context Preservation**: Maintains meaning across cultural boundaries
- **High-Quality Audio**: WAV format output for best quality
- **Batch Processing Ready**: Designed for scalability
- **Error Handling**: Comprehensive error management and user feedback
## ๐ฆ Deployment
### Hugging Face Spaces
This application is ready for deployment on Hugging Face Spaces:
1. Upload all files to your Hugging Face Space
2. Set `GEMINI_API_KEY` in Space secrets
3. The app will automatically start on port 7860
### Docker Support
```dockerfile
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY app.py .
EXPOSE 7860
CMD ["python", "app.py"]
```
## ๐ค Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## ๐ License
This project is licensed under the MIT License.
## ๐ Acknowledgments
- Microsoft Edge TTS for high-quality neural voices
- Google Gemini for advanced AI capabilities
- Librosa for advanced audio processing
- Gradio team for the excellent UI framework
---
**Developed by Digitized Brains** ๐ง |