Spaces:

ducnguyen1978
/

Voice_Agent

Running

App Files Files Community

Voice_Agent / README.md

ducnguyen1978

Upload 3 files

9b237e2 verified 16 days ago

preview code

raw

history blame contribute delete

4.89 kB

	---
	title: Voice Studio & Audio Translation
	emoji: 🎤
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	sdk_version: 5.43.1
	app_file: app.py
	pinned: false
	---

	# 🎤 Voice Studio & Audio Translation

	A comprehensive AI-powered application that combines text-to-speech synthesis and audio translation capabilities in one unified interface.

	## 🌟 Features

	### 🎤 Voice Studio
	- 26 High-Quality Voices: Standard neural voices across 13 countries
	- Multi-Language Support: Vietnamese, English (US/UK), German, French, Spanish, Italian, Japanese, Korean, Chinese, Russian, Portuguese, Arabic
	- Speed Control: Adjustable speech rate from 0.5x to 2.0x
	- Instant Download: Generate and download MP3 files
	- Pure Neural Voices: Only official Edge TTS neural voices, no artificial variations

	### 🎙️ Audio Translation
	- Audio Transcription: Powered by Google Gemini 2.0 Flash
	- Language Detection: Automatic source language identification
	- Cultural Translation: Context-aware translation preserving cultural nuances
	- Voice Synthesis: Integrated with Voice Studio's 26 voices
	- Multiple Formats: Download as TXT or Word documents
	- Side-by-Side Comparison: Compare original and translated content

	## 🚀 Supported Languages

	Voice Studio (26 voices):
	- 🇻🇳 Vietnamese: HoaiMy (Female), NamMinh (Male)
	- 🇺🇸 American English: Aria (Female), Guy (Male)
	- 🇬🇧 British English: Sonia (Female), Ryan (Male)
	- 🇩🇪 German: Katja (Female), Conrad (Male)
	- 🇫🇷 French: Denise (Female), Henri (Male)
	- 🇪🇸 Spanish: Elvira (Female), Alvaro (Male)
	- 🇮🇹 Italian: Elsa (Female), Diego (Male)
	- 🇯🇵 Japanese: Nanami (Female), Keita (Male)
	- 🇰🇷 Korean: SunHi (Female), BongJin (Male)
	- 🇨🇳 Chinese: Xiaoxiao (Female), Yunxi (Male)
	- 🇷🇺 Russian: Svetlana (Female), Dmitry (Male)
	- 🇵🇹 Portuguese: Francisca (Female), Antonio (Male)
	- 🇸🇦 Arabic: Zariyah (Female), Hamed (Male)

	Audio Translation:
	- All Voice Studio languages plus additional Google TTS supported languages

	## 🔧 Technology Stack

	- Frontend: Gradio 4.0+ with responsive mobile design
	- TTS Engine: Microsoft Edge TTS Neural Voices
	- AI Translation: Google Gemini 2.0 Flash
	- Audio Processing: Google Text-to-Speech, advanced audio libraries
	- File Handling: SoundFile, Librosa, python-docx

	## ⚙️ Setup

	### Prerequisites
	- Python 3.8+
	- Google Gemini API Key

	### Environment Variables
	```bash
	export GEMINI_API_KEY="your_gemini_api_key_here"
	```

	### Installation
	```bash
	pip install -r requirements.txt
	```

	### Run the Application
	```bash
	python app.py
	```

	The application will be available at `http://localhost:7860`

	## 📱 Mobile Optimized

	The interface is fully responsive and optimized for mobile devices with:
	- Touch-friendly buttons
	- Vertical stacking on small screens
	- Optimized font sizes and spacing
	- Mobile-first design approach

	## 🔒 Privacy & Security

	- No Data Storage: All processing is done in memory
	- Temporary Files: Audio and text files are automatically cleaned up
	- Secure API: Uses environment variables for API keys
	- Local Processing: Text-to-speech runs locally using Edge TTS

	## 🎯 Use Cases

	- Language Learning: Practice pronunciation in multiple languages
	- Content Creation: Generate multilingual audio content
	- Accessibility: Convert text to speech for visually impaired users
	- Translation Services: Translate audio content while preserving voice characteristics
	- Podcast Localization: Create multilingual versions of audio content

	## 🛠️ Advanced Features

	- Automatic Language Detection: Intelligently detects source language
	- Cultural Context Preservation: Maintains meaning across cultural boundaries
	- High-Quality Audio: WAV format output for best quality
	- Batch Processing Ready: Designed for scalability
	- Error Handling: Comprehensive error management and user feedback

	## 📦 Deployment

	### Hugging Face Spaces
	This application is ready for deployment on Hugging Face Spaces:

	1. Upload all files to your Hugging Face Space
	2. Set `GEMINI_API_KEY` in Space secrets
	3. The app will automatically start on port 7860

	### Docker Support
	```dockerfile
	FROM python:3.9-slim

	WORKDIR /app
	COPY requirements.txt .
	RUN pip install -r requirements.txt

	COPY app.py .
	EXPOSE 7860

	CMD ["python", "app.py"]
	```

	## 🤝 Contributing

	Contributions are welcome! Please feel free to submit a Pull Request.

	## 📄 License

	This project is licensed under the MIT License.

	## 🙏 Acknowledgments

	- Microsoft Edge TTS for high-quality neural voices
	- Google Gemini for advanced AI capabilities
	- Librosa for advanced audio processing
	- Gradio team for the excellent UI framework

	---

	Developed by Digitized Brains 🧠