Spaces:

ducnguyen1978
/

Voice_Agent

Running

File size: 4,890 Bytes

9b237e2

---
title: Voice Studio & Audio Translation
emoji: 🎤
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.43.1
app_file: app.py
pinned: false
---

# 🎤 Voice Studio & Audio Translation

A comprehensive AI-powered application that combines text-to-speech synthesis and audio translation capabilities in one unified interface.

## 🌟 Features

### 🎤 Voice Studio
- **26 High-Quality Voices**: Standard neural voices across 13 countries
- **Multi-Language Support**: Vietnamese, English (US/UK), German, French, Spanish, Italian, Japanese, Korean, Chinese, Russian, Portuguese, Arabic
- **Speed Control**: Adjustable speech rate from 0.5x to 2.0x
- **Instant Download**: Generate and download MP3 files
- **Pure Neural Voices**: Only official Edge TTS neural voices, no artificial variations

### 🎙️ Audio Translation
- **Audio Transcription**: Powered by Google Gemini 2.0 Flash
- **Language Detection**: Automatic source language identification
- **Cultural Translation**: Context-aware translation preserving cultural nuances
- **Voice Synthesis**: Integrated with Voice Studio's 26 voices
- **Multiple Formats**: Download as TXT or Word documents
- **Side-by-Side Comparison**: Compare original and translated content

## 🚀 Supported Languages

**Voice Studio (26 voices):**
- 🇻🇳 **Vietnamese**: HoaiMy (Female), NamMinh (Male)
- 🇺🇸 **American English**: Aria (Female), Guy (Male)
- 🇬🇧 **British English**: Sonia (Female), Ryan (Male)
- 🇩🇪 **German**: Katja (Female), Conrad (Male)
- 🇫🇷 **French**: Denise (Female), Henri (Male)
- 🇪🇸 **Spanish**: Elvira (Female), Alvaro (Male)
- 🇮🇹 **Italian**: Elsa (Female), Diego (Male)
- 🇯🇵 **Japanese**: Nanami (Female), Keita (Male)
- 🇰🇷 **Korean**: SunHi (Female), BongJin (Male)
- 🇨🇳 **Chinese**: Xiaoxiao (Female), Yunxi (Male)
- 🇷🇺 **Russian**: Svetlana (Female), Dmitry (Male)
- 🇵🇹 **Portuguese**: Francisca (Female), Antonio (Male)
- 🇸🇦 **Arabic**: Zariyah (Female), Hamed (Male)

**Audio Translation:**
- All Voice Studio languages plus additional Google TTS supported languages

## 🔧 Technology Stack

- **Frontend**: Gradio 4.0+ with responsive mobile design
- **TTS Engine**: Microsoft Edge TTS Neural Voices
- **AI Translation**: Google Gemini 2.0 Flash
- **Audio Processing**: Google Text-to-Speech, advanced audio libraries
- **File Handling**: SoundFile, Librosa, python-docx

## ⚙️ Setup

### Prerequisites
- Python 3.8+
- Google Gemini API Key

### Environment Variables
```bash
export GEMINI_API_KEY="your_gemini_api_key_here"
```

### Installation
```bash
pip install -r requirements.txt
```

### Run the Application
```bash
python app.py
```

The application will be available at `http://localhost:7860`

## 📱 Mobile Optimized

The interface is fully responsive and optimized for mobile devices with:
- Touch-friendly buttons
- Vertical stacking on small screens
- Optimized font sizes and spacing
- Mobile-first design approach

## 🔒 Privacy & Security

- **No Data Storage**: All processing is done in memory
- **Temporary Files**: Audio and text files are automatically cleaned up
- **Secure API**: Uses environment variables for API keys
- **Local Processing**: Text-to-speech runs locally using Edge TTS

## 🎯 Use Cases

- **Language Learning**: Practice pronunciation in multiple languages
- **Content Creation**: Generate multilingual audio content
- **Accessibility**: Convert text to speech for visually impaired users
- **Translation Services**: Translate audio content while preserving voice characteristics
- **Podcast Localization**: Create multilingual versions of audio content

## 🛠️ Advanced Features

- **Automatic Language Detection**: Intelligently detects source language
- **Cultural Context Preservation**: Maintains meaning across cultural boundaries
- **High-Quality Audio**: WAV format output for best quality
- **Batch Processing Ready**: Designed for scalability
- **Error Handling**: Comprehensive error management and user feedback

## 📦 Deployment

### Hugging Face Spaces
This application is ready for deployment on Hugging Face Spaces:

1. Upload all files to your Hugging Face Space
2. Set `GEMINI_API_KEY` in Space secrets
3. The app will automatically start on port 7860

### Docker Support
```dockerfile
FROM python:3.9-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

COPY app.py .
EXPOSE 7860

CMD ["python", "app.py"]
```

## 🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## 📄 License

This project is licensed under the MIT License.

## 🙏 Acknowledgments

- Microsoft Edge TTS for high-quality neural voices
- Google Gemini for advanced AI capabilities
- Librosa for advanced audio processing
- Gradio team for the excellent UI framework

---

**Developed by Digitized Brains** 🧠