teachingAssistant / DEVELOPER_GUIDE.md
Michael Hu
add more logs
fdc056d
# Developer Guide
This guide provides comprehensive instructions for extending the Audio Translation System with new providers and contributing to the codebase.
## Table of Contents
- [Architecture Overview](#architecture-overview)
- [Adding New TTS Providers](#adding-new-tts-providers)
- [Adding New STT Providers](#adding-new-stt-providers)
- [Adding New Translation Providers](#adding-new-translation-providers)
- [Testing Guidelines](#testing-guidelines)
- [Code Style and Standards](#code-style-and-standards)
- [Debugging and Troubleshooting](#debugging-and-troubleshooting)
- [Performance Considerations](#performance-considerations)
## Architecture Overview
The system follows Domain-Driven Design (DDD) principles with clear separation of concerns:
```
src/
β”œβ”€β”€ domain/ # Core business logic
β”‚ β”œβ”€β”€ interfaces/ # Service contracts (ports)
β”‚ β”œβ”€β”€ models/ # Domain entities and value objects
β”‚ β”œβ”€β”€ services/ # Domain services
β”‚ └── exceptions.py # Domain-specific exceptions
β”œβ”€β”€ application/ # Use case orchestration
β”‚ β”œβ”€β”€ services/ # Application services
β”‚ β”œβ”€β”€ dtos/ # Data transfer objects
β”‚ └── error_handling/ # Application error handling
β”œβ”€β”€ infrastructure/ # External service implementations
β”‚ β”œβ”€β”€ tts/ # TTS provider implementations
β”‚ β”œβ”€β”€ stt/ # STT provider implementations
β”‚ β”œβ”€β”€ translation/ # Translation service implementations
β”‚ β”œβ”€β”€ base/ # Provider base classes
β”‚ └── config/ # Configuration and DI container
└── presentation/ # UI layer (app.py)
```
### Key Design Patterns
1. **Provider Pattern**: Pluggable implementations for different services
2. **Factory Pattern**: Provider creation with fallback logic
3. **Dependency Injection**: Loose coupling between components
4. **Repository Pattern**: Data access abstraction
5. **Strategy Pattern**: Runtime algorithm selection
## Adding New TTS Providers
### Step 1: Implement the Provider Class
Create a new provider class that inherits from `TTSProviderBase`:
```python
# src/infrastructure/tts/my_tts_provider.py
import logging
from typing import Iterator, List
from ..base.tts_provider_base import TTSProviderBase
from ...domain.models.speech_synthesis_request import SpeechSynthesisRequest
from ...domain.exceptions import SpeechSynthesisException
logger = logging.getLogger(__name__)
class MyTTSProvider(TTSProviderBase):
"""Custom TTS provider implementation."""
def __init__(self, api_key: str = None, **kwargs):
"""Initialize the TTS provider.
Args:
api_key: Optional API key for cloud-based services
**kwargs: Additional provider-specific configuration
"""
super().__init__(
provider_name="my_tts",
supported_languages=["en", "zh", "es", "fr"]
)
self.api_key = api_key
self._initialize_provider()
def _initialize_provider(self):
"""Initialize provider-specific resources."""
try:
# Initialize your TTS engine/model here
# Example: self.engine = MyTTSEngine(api_key=self.api_key)
pass
except Exception as e:
logger.error(f"Failed to initialize {self.provider_name}: {e}")
raise SpeechSynthesisException(f"Provider initialization failed: {e}")
def is_available(self) -> bool:
"""Check if the provider is available and ready to use."""
try:
# Check if dependencies are installed
# Check if models are loaded
# Check if API is accessible (for cloud services)
return True # Replace with actual availability check
except Exception:
return False
def get_available_voices(self) -> List[str]:
"""Get list of available voices for this provider."""
# Return actual voice IDs supported by your provider
return ["voice1", "voice2", "voice3"]
def _generate_audio(self, request: SpeechSynthesisRequest) -> tuple[bytes, int]:
"""Generate audio data from synthesis request.
Args:
request: The speech synthesis request
Returns:
tuple: (audio_data_bytes, sample_rate)
"""
try:
text = request.text_content.text
voice_id = request.voice_settings.voice_id
speed = request.voice_settings.speed
# Implement your TTS synthesis logic here
# Example:
# audio_data = self.engine.synthesize(
# text=text,
# voice=voice_id,
# speed=speed
# )
# Return audio data and sample rate
audio_data = b"dummy_audio_data" # Replace with actual synthesis
sample_rate = 22050 # Replace with actual sample rate
return audio_data, sample_rate
except Exception as e:
self._handle_provider_error(e, "audio generation")
def _generate_audio_stream(self, request: SpeechSynthesisRequest) -> Iterator[tuple[bytes, int, bool]]:
"""Generate audio data stream from synthesis request.
Args:
request: The speech synthesis request
Yields:
tuple: (audio_data_bytes, sample_rate, is_final)
"""
try:
# Implement streaming synthesis if supported
# For non-streaming providers, you can yield the complete audio as a single chunk
audio_data, sample_rate = self._generate_audio(request)
yield audio_data, sample_rate, True
except Exception as e:
self._handle_provider_error(e, "streaming audio generation")
```
### Step 2: Register the Provider
Add your provider to the factory registration:
```python
# src/infrastructure/tts/provider_factory.py
def _register_default_providers(self):
"""Register all available TTS providers."""
# ... existing providers ...
# Try to register your custom provider
try:
from .my_tts_provider import MyTTSProvider
self._providers['my_tts'] = MyTTSProvider
logger.info("Registered MyTTS provider")
except ImportError as e:
logger.info(f"MyTTS provider not available: {e}")
```
### Step 3: Add Configuration Support
Update the configuration to include your provider:
```python
# src/infrastructure/config/app_config.py
class AppConfig:
# ... existing configuration ...
# TTS Provider Configuration
TTS_PROVIDERS = os.getenv('TTS_PROVIDERS', 'kokoro,dia,cosyvoice2,my_tts,dummy').split(',')
# Provider-specific settings
MY_TTS_API_KEY = os.getenv('MY_TTS_API_KEY')
MY_TTS_MODEL = os.getenv('MY_TTS_MODEL', 'default')
```
### Step 4: Add Tests
Create comprehensive tests for your provider:
```python
# tests/unit/infrastructure/tts/test_my_tts_provider.py
import pytest
from unittest.mock import Mock, patch
from src.infrastructure.tts.my_tts_provider import MyTTSProvider
from src.domain.models.speech_synthesis_request import SpeechSynthesisRequest
from src.domain.models.text_content import TextContent
from src.domain.models.voice_settings import VoiceSettings
from src.domain.exceptions import SpeechSynthesisException
class TestMyTTSProvider:
"""Test suite for MyTTS provider."""
@pytest.fixture
def provider(self):
"""Create a test provider instance."""
return MyTTSProvider(api_key="test_key")
@pytest.fixture
def synthesis_request(self):
"""Create a test synthesis request."""
text_content = TextContent(text="Hello world", language="en")
voice_settings = VoiceSettings(voice_id="voice1", speed=1.0)
return SpeechSynthesisRequest(
text_content=text_content,
voice_settings=voice_settings
)
def test_provider_initialization(self, provider):
"""Test provider initializes correctly."""
assert provider.provider_name == "my_tts"
assert "en" in provider.supported_languages
assert provider.is_available()
def test_get_available_voices(self, provider):
"""Test voice listing."""
voices = provider.get_available_voices()
assert isinstance(voices, list)
assert len(voices) > 0
assert "voice1" in voices
def test_synthesize_success(self, provider, synthesis_request):
"""Test successful synthesis."""
with patch.object(provider, '_generate_audio') as mock_generate:
mock_generate.return_value = (b"audio_data", 22050)
result = provider.synthesize(synthesis_request)
assert result.data == b"audio_data"
assert result.format == "wav"
assert result.sample_rate == 22050
mock_generate.assert_called_once_with(synthesis_request)
def test_synthesize_failure(self, provider, synthesis_request):
"""Test synthesis failure handling."""
with patch.object(provider, '_generate_audio') as mock_generate:
mock_generate.side_effect = Exception("Synthesis failed")
with pytest.raises(SpeechSynthesisException):
provider.synthesize(synthesis_request)
def test_synthesize_stream(self, provider, synthesis_request):
"""Test streaming synthesis."""
chunks = list(provider.synthesize_stream(synthesis_request))
assert len(chunks) > 0
assert chunks[-1].is_final # Last chunk should be marked as final
# Verify chunk structure
for chunk in chunks:
assert hasattr(chunk, 'data')
assert hasattr(chunk, 'sample_rate')
assert hasattr(chunk, 'is_final')
```
### Step 5: Add Integration Tests
```python
# tests/integration/test_my_tts_integration.py
import pytest
from src.infrastructure.config.container_setup import initialize_global_container
from src.infrastructure.tts.provider_factory import TTSProviderFactory
from src.domain.models.speech_synthesis_request import SpeechSynthesisRequest
from src.domain.models.text_content import TextContent
from src.domain.models.voice_settings import VoiceSettings
@pytest.mark.integration
class TestMyTTSIntegration:
"""Integration tests for MyTTS provider."""
def test_provider_factory_integration(self):
"""Test provider works with factory."""
factory = TTSProviderFactory()
if 'my_tts' in factory.get_available_providers():
provider = factory.create_provider('my_tts')
assert provider.is_available()
assert len(provider.get_available_voices()) > 0
def test_end_to_end_synthesis(self):
"""Test complete synthesis workflow."""
container = initialize_global_container()
factory = container.resolve(TTSProviderFactory)
if 'my_tts' in factory.get_available_providers():
provider = factory.create_provider('my_tts')
# Create synthesis request
text_content = TextContent(text="Integration test", language="en")
voice_settings = VoiceSettings(voice_id="voice1", speed=1.0)
request = SpeechSynthesisRequest(
text_content=text_content,
voice_settings=voice_settings
)
# Synthesize audio
result = provider.synthesize(request)
assert result.data is not None
assert result.duration > 0
assert result.sample_rate > 0
```
## Adding New STT Providers
### Step 1: Implement the Provider Class
```python
# src/infrastructure/stt/my_stt_provider.py
import logging
from typing import List
from ..base.stt_provider_base import STTProviderBase
from ...domain.models.audio_content import AudioContent
from ...domain.models.text_content import TextContent
from ...domain.exceptions import SpeechRecognitionException
logger = logging.getLogger(__name__)
class MySTTProvider(STTProviderBase):
"""Custom STT provider implementation."""
def __init__(self, model_path: str = None, **kwargs):
"""Initialize the STT provider.
Args:
model_path: Path to the STT model
**kwargs: Additional provider-specific configuration
"""
super().__init__(
provider_name="my_stt",
supported_languages=["en", "zh", "es", "fr"],
supported_models=["my_stt_small", "my_stt_large"]
)
self.model_path = model_path
self._initialize_provider()
def _initialize_provider(self):
"""Initialize provider-specific resources."""
try:
# Initialize your STT engine/model here
# Example: self.model = MySTTModel.load(self.model_path)
pass
except Exception as e:
logger.error(f"Failed to initialize {self.provider_name}: {e}")
raise SpeechRecognitionException(f"Provider initialization failed: {e}")
def is_available(self) -> bool:
"""Check if the provider is available."""
try:
# Check dependencies, model availability, etc.
return True # Replace with actual check
except Exception:
return False
def get_supported_models(self) -> List[str]:
"""Get list of supported models."""
return self.supported_models
def _transcribe_audio(self, audio: AudioContent, model: str) -> tuple[str, float, dict]:
"""Transcribe audio using the specified model.
Args:
audio: Audio content to transcribe
model: Model identifier to use
Returns:
tuple: (transcribed_text, confidence_score, metadata)
"""
try:
# Implement your STT logic here
# Example:
# result = self.model.transcribe(
# audio_data=audio.data,
# sample_rate=audio.sample_rate,
# model=model
# )
# Return transcription results
text = "Transcribed text" # Replace with actual transcription
confidence = 0.95 # Replace with actual confidence
metadata = {
"model_used": model,
"processing_time": 1.5,
"language_detected": "en"
}
return text, confidence, metadata
except Exception as e:
self._handle_provider_error(e, "transcription")
```
### Step 2: Register and Test
Follow similar steps as TTS providers for registration, configuration, and testing.
## Adding New Translation Providers
### Step 1: Implement the Provider Class
```python
# src/infrastructure/translation/my_translation_provider.py
import logging
from typing import List, Dict
from ..base.translation_provider_base import TranslationProviderBase
from ...domain.models.translation_request import TranslationRequest
from ...domain.models.text_content import TextContent
from ...domain.exceptions import TranslationFailedException
logger = logging.getLogger(__name__)
class MyTranslationProvider(TranslationProviderBase):
"""Custom translation provider implementation."""
def __init__(self, api_key: str = None, **kwargs):
"""Initialize the translation provider."""
super().__init__(
provider_name="my_translation",
supported_languages=["en", "zh", "es", "fr", "de", "ja"]
)
self.api_key = api_key
self._initialize_provider()
def _initialize_provider(self):
"""Initialize provider-specific resources."""
try:
# Initialize your translation engine/model here
pass
except Exception as e:
logger.error(f"Failed to initialize {self.provider_name}: {e}")
raise TranslationFailedException(f"Provider initialization failed: {e}")
def is_available(self) -> bool:
"""Check if the provider is available."""
try:
# Check dependencies, API connectivity, etc.
return True # Replace with actual check
except Exception:
return False
def get_supported_language_pairs(self) -> List[tuple[str, str]]:
"""Get supported language pairs."""
# Return list of (source_lang, target_lang) tuples
pairs = []
for source in self.supported_languages:
for target in self.supported_languages:
if source != target:
pairs.append((source, target))
return pairs
def _translate_text(self, request: TranslationRequest) -> tuple[str, float, dict]:
"""Translate text using the provider.
Args:
request: Translation request
Returns:
tuple: (translated_text, confidence_score, metadata)
"""
try:
source_text = request.text_content.text
source_lang = request.source_language or request.text_content.language
target_lang = request.target_language
# Implement your translation logic here
# Example:
# result = self.translator.translate(
# text=source_text,
# source_lang=source_lang,
# target_lang=target_lang
# )
# Return translation results
translated_text = f"Translated: {source_text}" # Replace with actual translation
confidence = 0.92 # Replace with actual confidence
metadata = {
"source_language_detected": source_lang,
"target_language": target_lang,
"processing_time": 0.5,
"model_used": "my_translation_model"
}
return translated_text, confidence, metadata
except Exception as e:
self._handle_provider_error(e, "translation")
```
## Testing Guidelines
### Unit Testing
- Test each provider in isolation using mocks
- Cover success and failure scenarios
- Test edge cases (empty input, invalid parameters)
- Verify error handling and exception propagation
### Integration Testing
- Test provider integration with factories
- Test complete pipeline workflows
- Test fallback mechanisms
- Test with real external services (when available)
### Performance Testing
- Measure processing times for different input sizes
- Test memory usage and resource cleanup
- Test concurrent processing capabilities
- Benchmark against existing providers
### Test Structure
```
tests/
β”œβ”€β”€ unit/
β”‚ β”œβ”€β”€ domain/
β”‚ β”œβ”€β”€ application/
β”‚ └── infrastructure/
β”‚ β”œβ”€β”€ tts/
β”‚ β”œβ”€β”€ stt/
β”‚ └── translation/
β”œβ”€β”€ integration/
β”‚ β”œβ”€β”€ test_complete_pipeline.py
β”‚ β”œβ”€β”€ test_provider_fallback.py
β”‚ └── test_error_recovery.py
└── performance/
β”œβ”€β”€ test_processing_speed.py
β”œβ”€β”€ test_memory_usage.py
└── test_concurrent_processing.py
```
## Code Style and Standards
### Python Style Guide
- Follow PEP 8 for code formatting
- Use type hints for all public methods
- Write comprehensive docstrings (Google style)
- Use meaningful variable and function names
- Keep functions focused and small (< 50 lines)
### Documentation Standards
- Document all public interfaces
- Include usage examples in docstrings
- Explain complex algorithms and business logic
- Keep documentation up-to-date with code changes
### Error Handling
- Use domain-specific exceptions
- Provide detailed error messages
- Log errors with appropriate levels
- Implement graceful degradation where possible
### Logging
```python
import logging
logger = logging.getLogger(__name__)
# Use appropriate log levels
logger.info("Detailed debugging information")
logger.info("General information about program execution")
logger.warning("Something unexpected happened")
logger.error("A serious error occurred")
logger.critical("A very serious error occurred")
```
## Debugging and Troubleshooting
### Common Issues
1. **Provider Not Available**
- Check dependencies are installed
- Verify configuration settings
- Check logs for initialization errors
2. **Poor Quality Output**
- Verify input audio quality
- Check model parameters
- Review provider-specific settings
3. **Performance Issues**
- Profile code execution
- Check memory usage
- Optimize audio processing pipeline
### Debugging Tools
- Use Python debugger (pdb) for step-through debugging
- Enable detailed logging for troubleshooting
- Use profiling tools (cProfile, memory_profiler)
- Monitor system resources during processing
### Logging Configuration
```python
# Enable debug logging for development
import logging
logging.basicConfig(
level=logging.DEBUG,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler("debug.log"),
logging.StreamHandler()
]
)
```
## Performance Considerations
### Optimization Strategies
1. **Audio Processing**
- Use appropriate sample rates
- Implement streaming where possible
- Cache processed results
- Optimize memory usage
2. **Model Loading**
- Load models once and reuse
- Use lazy loading for optional providers
- Implement model caching strategies
3. **Concurrent Processing**
- Use async/await for I/O operations
- Implement thread-safe providers
- Consider multiprocessing for CPU-intensive tasks
### Memory Management
- Clean up temporary files
- Release model resources when not needed
- Monitor memory usage in long-running processes
- Implement resource pooling for expensive operations
### Monitoring and Metrics
- Track processing times
- Monitor error rates
- Measure resource utilization
- Implement health checks
## Contributing Guidelines
### Development Workflow
1. Fork the repository
2. Create a feature branch
3. Implement changes with tests
4. Run the full test suite
5. Submit a pull request
### Code Review Process
- All changes require code review
- Tests must pass before merging
- Documentation must be updated
- Performance impact should be assessed
### Release Process
- Follow semantic versioning
- Update changelog
- Tag releases appropriately
- Deploy to staging before production
---
For questions or support, please refer to the project documentation or open an issue in the repository.