File size: 4,890 Bytes
9b237e2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
---
title: Voice Studio & Audio Translation
emoji: ๐ŸŽค
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.43.1
app_file: app.py
pinned: false
---

# ๐ŸŽค Voice Studio & Audio Translation

A comprehensive AI-powered application that combines text-to-speech synthesis and audio translation capabilities in one unified interface.

## ๐ŸŒŸ Features

### ๐ŸŽค Voice Studio
- **26 High-Quality Voices**: Standard neural voices across 13 countries
- **Multi-Language Support**: Vietnamese, English (US/UK), German, French, Spanish, Italian, Japanese, Korean, Chinese, Russian, Portuguese, Arabic
- **Speed Control**: Adjustable speech rate from 0.5x to 2.0x
- **Instant Download**: Generate and download MP3 files
- **Pure Neural Voices**: Only official Edge TTS neural voices, no artificial variations

### ๐ŸŽ™๏ธ Audio Translation
- **Audio Transcription**: Powered by Google Gemini 2.0 Flash
- **Language Detection**: Automatic source language identification
- **Cultural Translation**: Context-aware translation preserving cultural nuances
- **Voice Synthesis**: Integrated with Voice Studio's 26 voices
- **Multiple Formats**: Download as TXT or Word documents
- **Side-by-Side Comparison**: Compare original and translated content

## ๐Ÿš€ Supported Languages

**Voice Studio (26 voices):**
- ๐Ÿ‡ป๐Ÿ‡ณ **Vietnamese**: HoaiMy (Female), NamMinh (Male)
- ๐Ÿ‡บ๐Ÿ‡ธ **American English**: Aria (Female), Guy (Male)
- ๐Ÿ‡ฌ๐Ÿ‡ง **British English**: Sonia (Female), Ryan (Male)
- ๐Ÿ‡ฉ๐Ÿ‡ช **German**: Katja (Female), Conrad (Male)
- ๐Ÿ‡ซ๐Ÿ‡ท **French**: Denise (Female), Henri (Male)
- ๐Ÿ‡ช๐Ÿ‡ธ **Spanish**: Elvira (Female), Alvaro (Male)
- ๐Ÿ‡ฎ๐Ÿ‡น **Italian**: Elsa (Female), Diego (Male)
- ๐Ÿ‡ฏ๐Ÿ‡ต **Japanese**: Nanami (Female), Keita (Male)
- ๐Ÿ‡ฐ๐Ÿ‡ท **Korean**: SunHi (Female), BongJin (Male)
- ๐Ÿ‡จ๐Ÿ‡ณ **Chinese**: Xiaoxiao (Female), Yunxi (Male)
- ๐Ÿ‡ท๐Ÿ‡บ **Russian**: Svetlana (Female), Dmitry (Male)
- ๐Ÿ‡ต๐Ÿ‡น **Portuguese**: Francisca (Female), Antonio (Male)
- ๐Ÿ‡ธ๐Ÿ‡ฆ **Arabic**: Zariyah (Female), Hamed (Male)

**Audio Translation:**
- All Voice Studio languages plus additional Google TTS supported languages

## ๐Ÿ”ง Technology Stack

- **Frontend**: Gradio 4.0+ with responsive mobile design
- **TTS Engine**: Microsoft Edge TTS Neural Voices
- **AI Translation**: Google Gemini 2.0 Flash
- **Audio Processing**: Google Text-to-Speech, advanced audio libraries
- **File Handling**: SoundFile, Librosa, python-docx

## โš™๏ธ Setup

### Prerequisites
- Python 3.8+
- Google Gemini API Key

### Environment Variables
```bash
export GEMINI_API_KEY="your_gemini_api_key_here"
```

### Installation
```bash
pip install -r requirements.txt
```

### Run the Application
```bash
python app.py
```

The application will be available at `http://localhost:7860`

## ๐Ÿ“ฑ Mobile Optimized

The interface is fully responsive and optimized for mobile devices with:
- Touch-friendly buttons
- Vertical stacking on small screens
- Optimized font sizes and spacing
- Mobile-first design approach

## ๐Ÿ”’ Privacy & Security

- **No Data Storage**: All processing is done in memory
- **Temporary Files**: Audio and text files are automatically cleaned up
- **Secure API**: Uses environment variables for API keys
- **Local Processing**: Text-to-speech runs locally using Edge TTS

## ๐ŸŽฏ Use Cases

- **Language Learning**: Practice pronunciation in multiple languages
- **Content Creation**: Generate multilingual audio content
- **Accessibility**: Convert text to speech for visually impaired users
- **Translation Services**: Translate audio content while preserving voice characteristics
- **Podcast Localization**: Create multilingual versions of audio content

## ๐Ÿ› ๏ธ Advanced Features

- **Automatic Language Detection**: Intelligently detects source language
- **Cultural Context Preservation**: Maintains meaning across cultural boundaries
- **High-Quality Audio**: WAV format output for best quality
- **Batch Processing Ready**: Designed for scalability
- **Error Handling**: Comprehensive error management and user feedback

## ๐Ÿ“ฆ Deployment

### Hugging Face Spaces
This application is ready for deployment on Hugging Face Spaces:

1. Upload all files to your Hugging Face Space
2. Set `GEMINI_API_KEY` in Space secrets
3. The app will automatically start on port 7860

### Docker Support
```dockerfile
FROM python:3.9-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

COPY app.py .
EXPOSE 7860

CMD ["python", "app.py"]
```

## ๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## ๐Ÿ“„ License

This project is licensed under the MIT License.

## ๐Ÿ™ Acknowledgments

- Microsoft Edge TTS for high-quality neural voices
- Google Gemini for advanced AI capabilities
- Librosa for advanced audio processing
- Gradio team for the excellent UI framework

---

**Developed by Digitized Brains** ๐Ÿง