Spaces:

DroolingPanda
/

teachingAssistant

Build error

File size: 22,525 Bytes

# Developer Guide

This guide provides comprehensive instructions for extending the Audio Translation System with new providers and contributing to the codebase.

## Table of Contents

- [Architecture Overview](#architecture-overview)
- [Adding New TTS Providers](#adding-new-tts-providers)
- [Adding New STT Providers](#adding-new-stt-providers)
- [Adding New Translation Providers](#adding-new-translation-providers)
- [Testing Guidelines](#testing-guidelines)
- [Code Style and Standards](#code-style-and-standards)
- [Debugging and Troubleshooting](#debugging-and-troubleshooting)
- [Performance Considerations](#performance-considerations)

## Architecture Overview

The system follows Domain-Driven Design (DDD) principles with clear separation of concerns:

```
src/
├── domain/                    # Core business logic
│   ├── interfaces/           # Service contracts (ports)
│   ├── models/              # Domain entities and value objects
│   ├── services/            # Domain services
│   └── exceptions.py        # Domain-specific exceptions
├── application/             # Use case orchestration
│   ├── services/            # Application services
│   ├── dtos/               # Data transfer objects
│   └── error_handling/     # Application error handling
├── infrastructure/         # External service implementations
│   ├── tts/               # TTS provider implementations
│   ├── stt/               # STT provider implementations
│   ├── translation/       # Translation service implementations
│   ├── base/              # Provider base classes
│   └── config/            # Configuration and DI container
└── presentation/          # UI layer (app.py)
```

### Key Design Patterns

1. **Provider Pattern**: Pluggable implementations for different services
2. **Factory Pattern**: Provider creation with fallback logic
3. **Dependency Injection**: Loose coupling between components
4. **Repository Pattern**: Data access abstraction
5. **Strategy Pattern**: Runtime algorithm selection

## Adding New TTS Providers

### Step 1: Implement the Provider Class

Create a new provider class that inherits from `TTSProviderBase`:

```python
# src/infrastructure/tts/my_tts_provider.py

import logging
from typing import Iterator, List
from ..base.tts_provider_base import TTSProviderBase
from ...domain.models.speech_synthesis_request import SpeechSynthesisRequest
from ...domain.exceptions import SpeechSynthesisException

logger = logging.getLogger(__name__)


class MyTTSProvider(TTSProviderBase):
    """Custom TTS provider implementation."""

    def __init__(self, api_key: str = None, **kwargs):
        """Initialize the TTS provider.

        Args:
            api_key: Optional API key for cloud-based services
            **kwargs: Additional provider-specific configuration
        """
        super().__init__(
            provider_name="my_tts",
            supported_languages=["en", "zh", "es", "fr"]
        )
        self.api_key = api_key
        self._initialize_provider()

    def _initialize_provider(self):
        """Initialize provider-specific resources."""
        try:
            # Initialize your TTS engine/model here
            # Example: self.engine = MyTTSEngine(api_key=self.api_key)
            pass
        except Exception as e:
            logger.error(f"Failed to initialize {self.provider_name}: {e}")
            raise SpeechSynthesisException(f"Provider initialization failed: {e}")

    def is_available(self) -> bool:
        """Check if the provider is available and ready to use."""
        try:
            # Check if dependencies are installed
            # Check if models are loaded
            # Check if API is accessible (for cloud services)
            return True  # Replace with actual availability check
        except Exception:
            return False

    def get_available_voices(self) -> List[str]:
        """Get list of available voices for this provider."""
        # Return actual voice IDs supported by your provider
        return ["voice1", "voice2", "voice3"]

    def _generate_audio(self, request: SpeechSynthesisRequest) -> tuple[bytes, int]:
        """Generate audio data from synthesis request.

        Args:
            request: The speech synthesis request

        Returns:
            tuple: (audio_data_bytes, sample_rate)
        """
        try:
            text = request.text_content.text
            voice_id = request.voice_settings.voice_id
            speed = request.voice_settings.speed

            # Implement your TTS synthesis logic here
            # Example:
            # audio_data = self.engine.synthesize(
            #     text=text,
            #     voice=voice_id,
            #     speed=speed
            # )

            # Return audio data and sample rate
            audio_data = b"dummy_audio_data"  # Replace with actual synthesis
            sample_rate = 22050  # Replace with actual sample rate

            return audio_data, sample_rate

        except Exception as e:
            self._handle_provider_error(e, "audio generation")

    def _generate_audio_stream(self, request: SpeechSynthesisRequest) -> Iterator[tuple[bytes, int, bool]]:
        """Generate audio data stream from synthesis request.

        Args:
            request: The speech synthesis request

        Yields:
            tuple: (audio_data_bytes, sample_rate, is_final)
        """
        try:
            # Implement streaming synthesis if supported
            # For non-streaming providers, you can yield the complete audio as a single chunk

            audio_data, sample_rate = self._generate_audio(request)
            yield audio_data, sample_rate, True

        except Exception as e:
            self._handle_provider_error(e, "streaming audio generation")
```

### Step 2: Register the Provider

Add your provider to the factory registration:

```python
# src/infrastructure/tts/provider_factory.py

def _register_default_providers(self):
    """Register all available TTS providers."""
    # ... existing providers ...

    # Try to register your custom provider
    try:
        from .my_tts_provider import MyTTSProvider
        self._providers['my_tts'] = MyTTSProvider
        logger.info("Registered MyTTS provider")
    except ImportError as e:
        logger.info(f"MyTTS provider not available: {e}")
```

### Step 3: Add Configuration Support

Update the configuration to include your provider:

```python
# src/infrastructure/config/app_config.py

class AppConfig:
    # ... existing configuration ...

    # TTS Provider Configuration
    TTS_PROVIDERS = os.getenv('TTS_PROVIDERS', 'kokoro,dia,cosyvoice2,my_tts,dummy').split(',')

    # Provider-specific settings
    MY_TTS_API_KEY = os.getenv('MY_TTS_API_KEY')
    MY_TTS_MODEL = os.getenv('MY_TTS_MODEL', 'default')
```

### Step 4: Add Tests

Create comprehensive tests for your provider:

```python
# tests/unit/infrastructure/tts/test_my_tts_provider.py

import pytest
from unittest.mock import Mock, patch
from src.infrastructure.tts.my_tts_provider import MyTTSProvider
from src.domain.models.speech_synthesis_request import SpeechSynthesisRequest
from src.domain.models.text_content import TextContent
from src.domain.models.voice_settings import VoiceSettings
from src.domain.exceptions import SpeechSynthesisException


class TestMyTTSProvider:
    """Test suite for MyTTS provider."""

    @pytest.fixture
    def provider(self):
        """Create a test provider instance."""
        return MyTTSProvider(api_key="test_key")

    @pytest.fixture
    def synthesis_request(self):
        """Create a test synthesis request."""
        text_content = TextContent(text="Hello world", language="en")
        voice_settings = VoiceSettings(voice_id="voice1", speed=1.0)
        return SpeechSynthesisRequest(
            text_content=text_content,
            voice_settings=voice_settings
        )

    def test_provider_initialization(self, provider):
        """Test provider initializes correctly."""
        assert provider.provider_name == "my_tts"
        assert "en" in provider.supported_languages
        assert provider.is_available()

    def test_get_available_voices(self, provider):
        """Test voice listing."""
        voices = provider.get_available_voices()
        assert isinstance(voices, list)
        assert len(voices) > 0
        assert "voice1" in voices

    def test_synthesize_success(self, provider, synthesis_request):
        """Test successful synthesis."""
        with patch.object(provider, '_generate_audio') as mock_generate:
            mock_generate.return_value = (b"audio_data", 22050)

            result = provider.synthesize(synthesis_request)

            assert result.data == b"audio_data"
            assert result.format == "wav"
            assert result.sample_rate == 22050
            mock_generate.assert_called_once_with(synthesis_request)

    def test_synthesize_failure(self, provider, synthesis_request):
        """Test synthesis failure handling."""
        with patch.object(provider, '_generate_audio') as mock_generate:
            mock_generate.side_effect = Exception("Synthesis failed")

            with pytest.raises(SpeechSynthesisException):
                provider.synthesize(synthesis_request)

    def test_synthesize_stream(self, provider, synthesis_request):
        """Test streaming synthesis."""
        chunks = list(provider.synthesize_stream(synthesis_request))

        assert len(chunks) > 0
        assert chunks[-1].is_final  # Last chunk should be marked as final

        # Verify chunk structure
        for chunk in chunks:
            assert hasattr(chunk, 'data')
            assert hasattr(chunk, 'sample_rate')
            assert hasattr(chunk, 'is_final')
```

### Step 5: Add Integration Tests

```python
# tests/integration/test_my_tts_integration.py

import pytest
from src.infrastructure.config.container_setup import initialize_global_container
from src.infrastructure.tts.provider_factory import TTSProviderFactory
from src.domain.models.speech_synthesis_request import SpeechSynthesisRequest
from src.domain.models.text_content import TextContent
from src.domain.models.voice_settings import VoiceSettings


@pytest.mark.integration
class TestMyTTSIntegration:
    """Integration tests for MyTTS provider."""

    def test_provider_factory_integration(self):
        """Test provider works with factory."""
        factory = TTSProviderFactory()

        if 'my_tts' in factory.get_available_providers():
            provider = factory.create_provider('my_tts')
            assert provider.is_available()
            assert len(provider.get_available_voices()) > 0

    def test_end_to_end_synthesis(self):
        """Test complete synthesis workflow."""
        container = initialize_global_container()
        factory = container.resolve(TTSProviderFactory)

        if 'my_tts' in factory.get_available_providers():
            provider = factory.create_provider('my_tts')

            # Create synthesis request
            text_content = TextContent(text="Integration test", language="en")
            voice_settings = VoiceSettings(voice_id="voice1", speed=1.0)
            request = SpeechSynthesisRequest(
                text_content=text_content,
                voice_settings=voice_settings
            )

            # Synthesize audio
            result = provider.synthesize(request)

            assert result.data is not None
            assert result.duration > 0
            assert result.sample_rate > 0
```

## Adding New STT Providers

### Step 1: Implement the Provider Class

```python
# src/infrastructure/stt/my_stt_provider.py

import logging
from typing import List
from ..base.stt_provider_base import STTProviderBase
from ...domain.models.audio_content import AudioContent
from ...domain.models.text_content import TextContent
from ...domain.exceptions import SpeechRecognitionException

logger = logging.getLogger(__name__)


class MySTTProvider(STTProviderBase):
    """Custom STT provider implementation."""

    def __init__(self, model_path: str = None, **kwargs):
        """Initialize the STT provider.

        Args:
            model_path: Path to the STT model
            **kwargs: Additional provider-specific configuration
        """
        super().__init__(
            provider_name="my_stt",
            supported_languages=["en", "zh", "es", "fr"],
            supported_models=["my_stt_small", "my_stt_large"]
        )
        self.model_path = model_path
        self._initialize_provider()

    def _initialize_provider(self):
        """Initialize provider-specific resources."""
        try:
            # Initialize your STT engine/model here
            # Example: self.model = MySTTModel.load(self.model_path)
            pass
        except Exception as e:
            logger.error(f"Failed to initialize {self.provider_name}: {e}")
            raise SpeechRecognitionException(f"Provider initialization failed: {e}")

    def is_available(self) -> bool:
        """Check if the provider is available."""
        try:
            # Check dependencies, model availability, etc.
            return True  # Replace with actual check
        except Exception:
            return False

    def get_supported_models(self) -> List[str]:
        """Get list of supported models."""
        return self.supported_models

    def _transcribe_audio(self, audio: AudioContent, model: str) -> tuple[str, float, dict]:
        """Transcribe audio using the specified model.

        Args:
            audio: Audio content to transcribe
            model: Model identifier to use

        Returns:
            tuple: (transcribed_text, confidence_score, metadata)
        """
        try:
            # Implement your STT logic here
            # Example:
            # result = self.model.transcribe(
            #     audio_data=audio.data,
            #     sample_rate=audio.sample_rate,
            #     model=model
            # )

            # Return transcription results
            text = "Transcribed text"  # Replace with actual transcription
            confidence = 0.95  # Replace with actual confidence
            metadata = {
                "model_used": model,
                "processing_time": 1.5,
                "language_detected": "en"
            }

            return text, confidence, metadata

        except Exception as e:
            self._handle_provider_error(e, "transcription")
```

### Step 2: Register and Test

Follow similar steps as TTS providers for registration, configuration, and testing.

## Adding New Translation Providers

### Step 1: Implement the Provider Class

```python
# src/infrastructure/translation/my_translation_provider.py

import logging
from typing import List, Dict
from ..base.translation_provider_base import TranslationProviderBase
from ...domain.models.translation_request import TranslationRequest
from ...domain.models.text_content import TextContent
from ...domain.exceptions import TranslationFailedException

logger = logging.getLogger(__name__)


class MyTranslationProvider(TranslationProviderBase):
    """Custom translation provider implementation."""

    def __init__(self, api_key: str = None, **kwargs):
        """Initialize the translation provider."""
        super().__init__(
            provider_name="my_translation",
            supported_languages=["en", "zh", "es", "fr", "de", "ja"]
        )
        self.api_key = api_key
        self._initialize_provider()

    def _initialize_provider(self):
        """Initialize provider-specific resources."""
        try:
            # Initialize your translation engine/model here
            pass
        except Exception as e:
            logger.error(f"Failed to initialize {self.provider_name}: {e}")
            raise TranslationFailedException(f"Provider initialization failed: {e}")

    def is_available(self) -> bool:
        """Check if the provider is available."""
        try:
            # Check dependencies, API connectivity, etc.
            return True  # Replace with actual check
        except Exception:
            return False

    def get_supported_language_pairs(self) -> List[tuple[str, str]]:
        """Get supported language pairs."""
        # Return list of (source_lang, target_lang) tuples
        pairs = []
        for source in self.supported_languages:
            for target in self.supported_languages:
                if source != target:
                    pairs.append((source, target))
        return pairs

    def _translate_text(self, request: TranslationRequest) -> tuple[str, float, dict]:
        """Translate text using the provider.

        Args:
            request: Translation request

        Returns:
            tuple: (translated_text, confidence_score, metadata)
        """
        try:
            source_text = request.text_content.text
            source_lang = request.source_language or request.text_content.language
            target_lang = request.target_language

            # Implement your translation logic here
            # Example:
            # result = self.translator.translate(
            #     text=source_text,
            #     source_lang=source_lang,
            #     target_lang=target_lang
            # )

            # Return translation results
            translated_text = f"Translated: {source_text}"  # Replace with actual translation
            confidence = 0.92  # Replace with actual confidence
            metadata = {
                "source_language_detected": source_lang,
                "target_language": target_lang,
                "processing_time": 0.5,
                "model_used": "my_translation_model"
            }

            return translated_text, confidence, metadata

        except Exception as e:
            self._handle_provider_error(e, "translation")
```

## Testing Guidelines

### Unit Testing

- Test each provider in isolation using mocks
- Cover success and failure scenarios
- Test edge cases (empty input, invalid parameters)
- Verify error handling and exception propagation

### Integration Testing

- Test provider integration with factories
- Test complete pipeline workflows
- Test fallback mechanisms
- Test with real external services (when available)

### Performance Testing

- Measure processing times for different input sizes
- Test memory usage and resource cleanup
- Test concurrent processing capabilities
- Benchmark against existing providers

### Test Structure

```
tests/
├── unit/
│   ├── domain/
│   ├── application/
│   └── infrastructure/
│       ├── tts/
│       ├── stt/
│       └── translation/
├── integration/
│   ├── test_complete_pipeline.py
│   ├── test_provider_fallback.py
│   └── test_error_recovery.py
└── performance/
    ├── test_processing_speed.py
    ├── test_memory_usage.py
    └── test_concurrent_processing.py
```

## Code Style and Standards

### Python Style Guide

- Follow PEP 8 for code formatting
- Use type hints for all public methods
- Write comprehensive docstrings (Google style)
- Use meaningful variable and function names
- Keep functions focused and small (< 50 lines)

### Documentation Standards

- Document all public interfaces
- Include usage examples in docstrings
- Explain complex algorithms and business logic
- Keep documentation up-to-date with code changes

### Error Handling

- Use domain-specific exceptions
- Provide detailed error messages
- Log errors with appropriate levels
- Implement graceful degradation where possible

### Logging

```python
import logging

logger = logging.getLogger(__name__)

# Use appropriate log levels
logger.info("Detailed debugging information")
logger.info("General information about program execution")
logger.warning("Something unexpected happened")
logger.error("A serious error occurred")
logger.critical("A very serious error occurred")
```

## Debugging and Troubleshooting

### Common Issues

1. **Provider Not Available**
   - Check dependencies are installed
   - Verify configuration settings
   - Check logs for initialization errors

2. **Poor Quality Output**
   - Verify input audio quality
   - Check model parameters
   - Review provider-specific settings

3. **Performance Issues**
   - Profile code execution
   - Check memory usage
   - Optimize audio processing pipeline

### Debugging Tools

- Use Python debugger (pdb) for step-through debugging
- Enable detailed logging for troubleshooting
- Use profiling tools (cProfile, memory_profiler)
- Monitor system resources during processing

### Logging Configuration

```python
# Enable debug logging for development
import logging
logging.basicConfig(
    level=logging.DEBUG,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler("debug.log"),
        logging.StreamHandler()
    ]
)
```

## Performance Considerations

### Optimization Strategies

1. **Audio Processing**
   - Use appropriate sample rates
   - Implement streaming where possible
   - Cache processed results
   - Optimize memory usage

2. **Model Loading**
   - Load models once and reuse
   - Use lazy loading for optional providers
   - Implement model caching strategies

3. **Concurrent Processing**
   - Use async/await for I/O operations
   - Implement thread-safe providers
   - Consider multiprocessing for CPU-intensive tasks

### Memory Management

- Clean up temporary files
- Release model resources when not needed
- Monitor memory usage in long-running processes
- Implement resource pooling for expensive operations

### Monitoring and Metrics

- Track processing times
- Monitor error rates
- Measure resource utilization
- Implement health checks

## Contributing Guidelines

### Development Workflow

1. Fork the repository
2. Create a feature branch
3. Implement changes with tests
4. Run the full test suite
5. Submit a pull request

### Code Review Process

- All changes require code review
- Tests must pass before merging
- Documentation must be updated
- Performance impact should be assessed

### Release Process

- Follow semantic versioning
- Update changelog
- Tag releases appropriately
- Deploy to staging before production

---

For questions or support, please refer to the project documentation or open an issue in the repository.