Spaces:
Build error
Build error
File size: 5,221 Bytes
4e4961e 5009cb8 4e4961e 5009cb8 4e4961e 5009cb8 4e4961e 5009cb8 4e4961e 5009cb8 4e4961e 5009cb8 4e4961e 5009cb8 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 |
"""
Audio processing service interface.
This module defines the core interface for audio processing pipeline orchestration.
The interface follows Domain-Driven Design principles, providing a clean contract
for the complete audio translation workflow.
Example:
```python
from src.domain.interfaces.audio_processing import IAudioProcessingService
from src.domain.models.audio_content import AudioContent
from src.domain.models.voice_settings import VoiceSettings
# Get service implementation from DI container
audio_service = container.resolve(IAudioProcessingService)
# Process audio through complete pipeline
result = audio_service.process_audio_pipeline(
audio=audio_content,
target_language="zh",
voice_settings=voice_settings
)
```
"""
from abc import ABC, abstractmethod
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from ..models.audio_content import AudioContent
from ..models.voice_settings import VoiceSettings
from ..models.processing_result import ProcessingResult
class IAudioProcessingService(ABC):
"""
Interface for audio processing pipeline orchestration.
This interface defines the contract for the complete audio translation pipeline,
coordinating Speech-to-Text, Translation, and Text-to-Speech services to provide
end-to-end audio translation functionality.
The interface is designed to be:
- Provider-agnostic: Works with any STT/Translation/TTS implementation
- Error-resilient: Handles failures gracefully with appropriate exceptions
- Observable: Provides detailed processing results and metadata
- Testable: Easy to mock for unit testing
Implementations should handle:
- Provider selection and fallback logic
- Error handling and recovery
- Performance monitoring and logging
- Resource cleanup and management
"""
@abstractmethod
def process_audio_pipeline(
self,
audio: 'AudioContent',
target_language: str,
voice_settings: 'VoiceSettings'
) -> 'ProcessingResult':
"""
Process audio through the complete pipeline: STT -> Translation -> TTS.
This method orchestrates the complete audio translation workflow:
1. Speech Recognition: Convert audio to text
2. Translation: Translate text to target language (if needed)
3. Speech Synthesis: Convert translated text back to audio
The implementation should:
- Validate input parameters
- Handle provider failures with fallback mechanisms
- Provide detailed error information on failure
- Clean up temporary resources
- Log processing steps for observability
Args:
audio: The input audio content to process. Must be a valid AudioContent
instance with supported format and reasonable duration.
target_language: The target language code for translation (e.g., 'zh', 'es', 'fr').
Must be supported by the translation provider.
voice_settings: Voice configuration for TTS synthesis including voice ID,
speed, and language preferences.
Returns:
ProcessingResult: Comprehensive result containing:
- success: Boolean indicating overall success
- original_text: Transcribed text from STT (if successful)
- translated_text: Translated text (if translation was performed)
- audio_output: Generated audio content (if TTS was successful)
- processing_time: Total processing duration in seconds
- error_message: Detailed error description (if failed)
- metadata: Additional processing information and metrics
Raises:
AudioProcessingException: If any step in the pipeline fails and cannot
be recovered through fallback mechanisms.
ValueError: If input parameters are invalid or unsupported.
Example:
```python
# Create audio content from file
with open("input.wav", "rb") as f:
audio = AudioContent(
data=f.read(),
format="wav",
sample_rate=16000,
duration=10.5
)
# Configure voice settings
voice_settings = VoiceSettings(
voice_id="kokoro",
speed=1.0,
language="zh"
)
# Process through pipeline
result = service.process_audio_pipeline(
audio=audio,
target_language="zh",
voice_settings=voice_settings
)
if result.success:
print(f"Original: {result.original_text}")
print(f"Translated: {result.translated_text}")
# Save output audio
with open("output.wav", "wb") as f:
f.write(result.audio_output.data)
else:
print(f"Processing failed: {result.error_message}")
```
"""
pass |