Spaces:
Build error
Build error
"""Speech recognition service interface. | |
This module defines the interface for speech-to-text (STT) services that convert | |
audio content into textual representation. The interface supports multiple STT | |
models and providers with consistent error handling. | |
The interface is designed to be: | |
- Model-agnostic: Works with any STT implementation (Whisper, Parakeet, etc.) | |
- Language-aware: Handles multiple languages and dialects | |
- Error-resilient: Provides detailed error information for debugging | |
- Performance-conscious: Supports both batch and streaming transcription | |
""" | |
from abc import ABC, abstractmethod | |
from typing import TYPE_CHECKING | |
if TYPE_CHECKING: | |
from ..models.audio_content import AudioContent | |
from ..models.text_content import TextContent | |
class ISpeechRecognitionService(ABC): | |
"""Interface for speech recognition services. | |
This interface defines the contract for converting audio content to text | |
using various STT models and providers. Implementations should handle | |
different audio formats, languages, and quality levels. | |
Example: | |
```python | |
# Use through dependency injection | |
stt_service = container.resolve(ISpeechRecognitionService) | |
# Transcribe audio | |
text_result = stt_service.transcribe( | |
audio=audio_content, | |
model="whisper-large" | |
) | |
print(f"Transcribed: {text_result.text}") | |
print(f"Language: {text_result.language}") | |
print(f"Confidence: {text_result.confidence}") | |
``` | |
""" | |
def transcribe(self, audio: 'AudioContent', model: str) -> 'TextContent': | |
"""Transcribe audio content to text using specified STT model. | |
Converts audio data into textual representation with language detection | |
and confidence scoring. The method should handle various audio formats | |
and quality levels gracefully. | |
Implementation considerations: | |
- Audio preprocessing (noise reduction, normalization) | |
- Language detection and handling | |
- Confidence scoring and quality assessment | |
- Memory management for large audio files | |
- Timeout handling for long audio content | |
Args: | |
audio: The audio content to transcribe. Must contain valid audio data | |
in a supported format (WAV, MP3, FLAC, etc.) with appropriate | |
sample rate and duration. | |
model: The STT model identifier to use for transcription. Examples: | |
- "whisper-small": Fast, lower accuracy | |
- "whisper-large": Slower, higher accuracy | |
- "parakeet": Real-time optimized | |
Must be supported by the implementation. | |
Returns: | |
TextContent: The transcription result containing: | |
- text: The transcribed text content | |
- language: Detected or specified language code | |
- confidence: Overall transcription confidence (0.0-1.0) | |
- metadata: Additional information like word-level timestamps, | |
alternative transcriptions, processing time | |
Raises: | |
SpeechRecognitionException: If transcription fails due to: | |
- Unsupported audio format or quality | |
- Model loading or inference errors | |
- Network issues (for cloud-based models) | |
- Insufficient system resources | |
ValueError: If input parameters are invalid: | |
- Empty or corrupted audio data | |
- Unsupported model identifier | |
- Invalid audio format specifications | |
Example: | |
```python | |
# Load audio file | |
with open("speech.wav", "rb") as f: | |
audio = AudioContent( | |
data=f.read(), | |
format="wav", | |
sample_rate=16000, | |
duration=30.0 | |
) | |
# Transcribe with high-accuracy model | |
try: | |
result = service.transcribe(audio, "whisper-large") | |
if result.confidence > 0.8: | |
print(f"High confidence: {result.text}") | |
else: | |
print(f"Low confidence: {result.text} ({result.confidence:.2f})") | |
except SpeechRecognitionException as e: | |
print(f"Transcription failed: {e}") | |
``` | |
""" | |
pass |