Spaces:
Sleeping
Sleeping
Michael Hu
commited on
Commit
·
4e4961e
1
Parent(s):
19fd91c
Add documentation and final validation
Browse files- DEVELOPER_GUIDE.md +701 -0
- README.md +246 -54
- pyproject.toml +57 -54
- src/domain/interfaces/audio_processing.py +114 -14
- src/domain/interfaces/speech_recognition.py +97 -12
- src/domain/interfaces/speech_synthesis.py +171 -21
- src/domain/interfaces/translation.py +118 -11
- src/infrastructure/base/tts_provider_base.py +79 -4
DEVELOPER_GUIDE.md
ADDED
|
@@ -0,0 +1,701 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Developer Guide
|
| 2 |
+
|
| 3 |
+
This guide provides comprehensive instructions for extending the Audio Translation System with new providers and contributing to the codebase.
|
| 4 |
+
|
| 5 |
+
## Table of Contents
|
| 6 |
+
|
| 7 |
+
- [Architecture Overview](#architecture-overview)
|
| 8 |
+
- [Adding New TTS Providers](#adding-new-tts-providers)
|
| 9 |
+
- [Adding New STT Providers](#adding-new-stt-providers)
|
| 10 |
+
- [Adding New Translation Providers](#adding-new-translation-providers)
|
| 11 |
+
- [Testing Guidelines](#testing-guidelines)
|
| 12 |
+
- [Code Style and Standards](#code-style-and-standards)
|
| 13 |
+
- [Debugging and Troubleshooting](#debugging-and-troubleshooting)
|
| 14 |
+
- [Performance Considerations](#performance-considerations)
|
| 15 |
+
|
| 16 |
+
## Architecture Overview
|
| 17 |
+
|
| 18 |
+
The system follows Domain-Driven Design (DDD) principles with clear separation of concerns:
|
| 19 |
+
|
| 20 |
+
```
|
| 21 |
+
src/
|
| 22 |
+
├── domain/ # Core business logic
|
| 23 |
+
│ ├── interfaces/ # Service contracts (ports)
|
| 24 |
+
│ ├── models/ # Domain entities and value objects
|
| 25 |
+
│ ├── services/ # Domain services
|
| 26 |
+
│ └── exceptions.py # Domain-specific exceptions
|
| 27 |
+
├── application/ # Use case orchestration
|
| 28 |
+
│ ├── services/ # Application services
|
| 29 |
+
│ ├── dtos/ # Data transfer objects
|
| 30 |
+
│ └── error_handling/ # Application error handling
|
| 31 |
+
├── infrastructure/ # External service implementations
|
| 32 |
+
│ ├── tts/ # TTS provider implementations
|
| 33 |
+
│ ├── stt/ # STT provider implementations
|
| 34 |
+
│ ├── translation/ # Translation service implementations
|
| 35 |
+
│ ├── base/ # Provider base classes
|
| 36 |
+
│ └── config/ # Configuration and DI container
|
| 37 |
+
└── presentation/ # UI layer (app.py)
|
| 38 |
+
```
|
| 39 |
+
|
| 40 |
+
### Key Design Patterns
|
| 41 |
+
|
| 42 |
+
1. **Provider Pattern**: Pluggable implementations for different services
|
| 43 |
+
2. **Factory Pattern**: Provider creation with fallback logic
|
| 44 |
+
3. **Dependency Injection**: Loose coupling between components
|
| 45 |
+
4. **Repository Pattern**: Data access abstraction
|
| 46 |
+
5. **Strategy Pattern**: Runtime algorithm selection
|
| 47 |
+
|
| 48 |
+
## Adding New TTS Providers
|
| 49 |
+
|
| 50 |
+
### Step 1: Implement the Provider Class
|
| 51 |
+
|
| 52 |
+
Create a new provider class that inherits from `TTSProviderBase`:
|
| 53 |
+
|
| 54 |
+
```python
|
| 55 |
+
# src/infrastructure/tts/my_tts_provider.py
|
| 56 |
+
|
| 57 |
+
import logging
|
| 58 |
+
from typing import Iterator, List
|
| 59 |
+
from ..base.tts_provider_base import TTSProviderBase
|
| 60 |
+
from ...domain.models.speech_synthesis_request import SpeechSynthesisRequest
|
| 61 |
+
from ...domain.exceptions import SpeechSynthesisException
|
| 62 |
+
|
| 63 |
+
logger = logging.getLogger(__name__)
|
| 64 |
+
|
| 65 |
+
|
| 66 |
+
class MyTTSProvider(TTSProviderBase):
|
| 67 |
+
"""Custom TTS provider implementation."""
|
| 68 |
+
|
| 69 |
+
def __init__(self, api_key: str = None, **kwargs):
|
| 70 |
+
"""Initialize the TTS provider.
|
| 71 |
+
|
| 72 |
+
Args:
|
| 73 |
+
api_key: Optional API key for cloud-based services
|
| 74 |
+
**kwargs: Additional provider-specific configuration
|
| 75 |
+
"""
|
| 76 |
+
super().__init__(
|
| 77 |
+
provider_name="my_tts",
|
| 78 |
+
supported_languages=["en", "zh", "es", "fr"]
|
| 79 |
+
)
|
| 80 |
+
self.api_key = api_key
|
| 81 |
+
self._initialize_provider()
|
| 82 |
+
|
| 83 |
+
def _initialize_provider(self):
|
| 84 |
+
"""Initialize provider-specific resources."""
|
| 85 |
+
try:
|
| 86 |
+
# Initialize your TTS engine/model here
|
| 87 |
+
# Example: self.engine = MyTTSEngine(api_key=self.api_key)
|
| 88 |
+
pass
|
| 89 |
+
except Exception as e:
|
| 90 |
+
logger.error(f"Failed to initialize {self.provider_name}: {e}")
|
| 91 |
+
raise SpeechSynthesisException(f"Provider initialization failed: {e}")
|
| 92 |
+
|
| 93 |
+
def is_available(self) -> bool:
|
| 94 |
+
"""Check if the provider is available and ready to use."""
|
| 95 |
+
try:
|
| 96 |
+
# Check if dependencies are installed
|
| 97 |
+
# Check if models are loaded
|
| 98 |
+
# Check if API is accessible (for cloud services)
|
| 99 |
+
return True # Replace with actual availability check
|
| 100 |
+
except Exception:
|
| 101 |
+
return False
|
| 102 |
+
|
| 103 |
+
def get_available_voices(self) -> List[str]:
|
| 104 |
+
"""Get list of available voices for this provider."""
|
| 105 |
+
# Return actual voice IDs supported by your provider
|
| 106 |
+
return ["voice1", "voice2", "voice3"]
|
| 107 |
+
|
| 108 |
+
def _generate_audio(self, request: SpeechSynthesisRequest) -> tuple[bytes, int]:
|
| 109 |
+
"""Generate audio data from synthesis request.
|
| 110 |
+
|
| 111 |
+
Args:
|
| 112 |
+
request: The speech synthesis request
|
| 113 |
+
|
| 114 |
+
Returns:
|
| 115 |
+
tuple: (audio_data_bytes, sample_rate)
|
| 116 |
+
"""
|
| 117 |
+
try:
|
| 118 |
+
text = request.text_content.text
|
| 119 |
+
voice_id = request.voice_settings.voice_id
|
| 120 |
+
speed = request.voice_settings.speed
|
| 121 |
+
|
| 122 |
+
# Implement your TTS synthesis logic here
|
| 123 |
+
# Example:
|
| 124 |
+
# audio_data = self.engine.synthesize(
|
| 125 |
+
# text=text,
|
| 126 |
+
# voice=voice_id,
|
| 127 |
+
# speed=speed
|
| 128 |
+
# )
|
| 129 |
+
|
| 130 |
+
# Return audio data and sample rate
|
| 131 |
+
audio_data = b"dummy_audio_data" # Replace with actual synthesis
|
| 132 |
+
sample_rate = 22050 # Replace with actual sample rate
|
| 133 |
+
|
| 134 |
+
return audio_data, sample_rate
|
| 135 |
+
|
| 136 |
+
except Exception as e:
|
| 137 |
+
self._handle_provider_error(e, "audio generation")
|
| 138 |
+
|
| 139 |
+
def _generate_audio_stream(self, request: SpeechSynthesisRequest) -> Iterator[tuple[bytes, int, bool]]:
|
| 140 |
+
"""Generate audio data stream from synthesis request.
|
| 141 |
+
|
| 142 |
+
Args:
|
| 143 |
+
request: The speech synthesis request
|
| 144 |
+
|
| 145 |
+
Yields:
|
| 146 |
+
tuple: (audio_data_bytes, sample_rate, is_final)
|
| 147 |
+
"""
|
| 148 |
+
try:
|
| 149 |
+
# Implement streaming synthesis if supported
|
| 150 |
+
# For non-streaming providers, you can yield the complete audio as a single chunk
|
| 151 |
+
|
| 152 |
+
audio_data, sample_rate = self._generate_audio(request)
|
| 153 |
+
yield audio_data, sample_rate, True
|
| 154 |
+
|
| 155 |
+
except Exception as e:
|
| 156 |
+
self._handle_provider_error(e, "streaming audio generation")
|
| 157 |
+
```
|
| 158 |
+
|
| 159 |
+
### Step 2: Register the Provider
|
| 160 |
+
|
| 161 |
+
Add your provider to the factory registration:
|
| 162 |
+
|
| 163 |
+
```python
|
| 164 |
+
# src/infrastructure/tts/provider_factory.py
|
| 165 |
+
|
| 166 |
+
def _register_default_providers(self):
|
| 167 |
+
"""Register all available TTS providers."""
|
| 168 |
+
# ... existing providers ...
|
| 169 |
+
|
| 170 |
+
# Try to register your custom provider
|
| 171 |
+
try:
|
| 172 |
+
from .my_tts_provider import MyTTSProvider
|
| 173 |
+
self._providers['my_tts'] = MyTTSProvider
|
| 174 |
+
logger.info("Registered MyTTS provider")
|
| 175 |
+
except ImportError as e:
|
| 176 |
+
logger.debug(f"MyTTS provider not available: {e}")
|
| 177 |
+
```
|
| 178 |
+
|
| 179 |
+
### Step 3: Add Configuration Support
|
| 180 |
+
|
| 181 |
+
Update the configuration to include your provider:
|
| 182 |
+
|
| 183 |
+
```python
|
| 184 |
+
# src/infrastructure/config/app_config.py
|
| 185 |
+
|
| 186 |
+
class AppConfig:
|
| 187 |
+
# ... existing configuration ...
|
| 188 |
+
|
| 189 |
+
# TTS Provider Configuration
|
| 190 |
+
TTS_PROVIDERS = os.getenv('TTS_PROVIDERS', 'kokoro,dia,cosyvoice2,my_tts,dummy').split(',')
|
| 191 |
+
|
| 192 |
+
# Provider-specific settings
|
| 193 |
+
MY_TTS_API_KEY = os.getenv('MY_TTS_API_KEY')
|
| 194 |
+
MY_TTS_MODEL = os.getenv('MY_TTS_MODEL', 'default')
|
| 195 |
+
```
|
| 196 |
+
|
| 197 |
+
### Step 4: Add Tests
|
| 198 |
+
|
| 199 |
+
Create comprehensive tests for your provider:
|
| 200 |
+
|
| 201 |
+
```python
|
| 202 |
+
# tests/unit/infrastructure/tts/test_my_tts_provider.py
|
| 203 |
+
|
| 204 |
+
import pytest
|
| 205 |
+
from unittest.mock import Mock, patch
|
| 206 |
+
from src.infrastructure.tts.my_tts_provider import MyTTSProvider
|
| 207 |
+
from src.domain.models.speech_synthesis_request import SpeechSynthesisRequest
|
| 208 |
+
from src.domain.models.text_content import TextContent
|
| 209 |
+
from src.domain.models.voice_settings import VoiceSettings
|
| 210 |
+
from src.domain.exceptions import SpeechSynthesisException
|
| 211 |
+
|
| 212 |
+
|
| 213 |
+
class TestMyTTSProvider:
|
| 214 |
+
"""Test suite for MyTTS provider."""
|
| 215 |
+
|
| 216 |
+
@pytest.fixture
|
| 217 |
+
def provider(self):
|
| 218 |
+
"""Create a test provider instance."""
|
| 219 |
+
return MyTTSProvider(api_key="test_key")
|
| 220 |
+
|
| 221 |
+
@pytest.fixture
|
| 222 |
+
def synthesis_request(self):
|
| 223 |
+
"""Create a test synthesis request."""
|
| 224 |
+
text_content = TextContent(text="Hello world", language="en")
|
| 225 |
+
voice_settings = VoiceSettings(voice_id="voice1", speed=1.0)
|
| 226 |
+
return SpeechSynthesisRequest(
|
| 227 |
+
text_content=text_content,
|
| 228 |
+
voice_settings=voice_settings
|
| 229 |
+
)
|
| 230 |
+
|
| 231 |
+
def test_provider_initialization(self, provider):
|
| 232 |
+
"""Test provider initializes correctly."""
|
| 233 |
+
assert provider.provider_name == "my_tts"
|
| 234 |
+
assert "en" in provider.supported_languages
|
| 235 |
+
assert provider.is_available()
|
| 236 |
+
|
| 237 |
+
def test_get_available_voices(self, provider):
|
| 238 |
+
"""Test voice listing."""
|
| 239 |
+
voices = provider.get_available_voices()
|
| 240 |
+
assert isinstance(voices, list)
|
| 241 |
+
assert len(voices) > 0
|
| 242 |
+
assert "voice1" in voices
|
| 243 |
+
|
| 244 |
+
def test_synthesize_success(self, provider, synthesis_request):
|
| 245 |
+
"""Test successful synthesis."""
|
| 246 |
+
with patch.object(provider, '_generate_audio') as mock_generate:
|
| 247 |
+
mock_generate.return_value = (b"audio_data", 22050)
|
| 248 |
+
|
| 249 |
+
result = provider.synthesize(synthesis_request)
|
| 250 |
+
|
| 251 |
+
assert result.data == b"audio_data"
|
| 252 |
+
assert result.format == "wav"
|
| 253 |
+
assert result.sample_rate == 22050
|
| 254 |
+
mock_generate.assert_called_once_with(synthesis_request)
|
| 255 |
+
|
| 256 |
+
def test_synthesize_failure(self, provider, synthesis_request):
|
| 257 |
+
"""Test synthesis failure handling."""
|
| 258 |
+
with patch.object(provider, '_generate_audio') as mock_generate:
|
| 259 |
+
mock_generate.side_effect = Exception("Synthesis failed")
|
| 260 |
+
|
| 261 |
+
with pytest.raises(SpeechSynthesisException):
|
| 262 |
+
provider.synthesize(synthesis_request)
|
| 263 |
+
|
| 264 |
+
def test_synthesize_stream(self, provider, synthesis_request):
|
| 265 |
+
"""Test streaming synthesis."""
|
| 266 |
+
chunks = list(provider.synthesize_stream(synthesis_request))
|
| 267 |
+
|
| 268 |
+
assert len(chunks) > 0
|
| 269 |
+
assert chunks[-1].is_final # Last chunk should be marked as final
|
| 270 |
+
|
| 271 |
+
# Verify chunk structure
|
| 272 |
+
for chunk in chunks:
|
| 273 |
+
assert hasattr(chunk, 'data')
|
| 274 |
+
assert hasattr(chunk, 'sample_rate')
|
| 275 |
+
assert hasattr(chunk, 'is_final')
|
| 276 |
+
```
|
| 277 |
+
|
| 278 |
+
### Step 5: Add Integration Tests
|
| 279 |
+
|
| 280 |
+
```python
|
| 281 |
+
# tests/integration/test_my_tts_integration.py
|
| 282 |
+
|
| 283 |
+
import pytest
|
| 284 |
+
from src.infrastructure.config.container_setup import initialize_global_container
|
| 285 |
+
from src.infrastructure.tts.provider_factory import TTSProviderFactory
|
| 286 |
+
from src.domain.models.speech_synthesis_request import SpeechSynthesisRequest
|
| 287 |
+
from src.domain.models.text_content import TextContent
|
| 288 |
+
from src.domain.models.voice_settings import VoiceSettings
|
| 289 |
+
|
| 290 |
+
|
| 291 |
+
@pytest.mark.integration
|
| 292 |
+
class TestMyTTSIntegration:
|
| 293 |
+
"""Integration tests for MyTTS provider."""
|
| 294 |
+
|
| 295 |
+
def test_provider_factory_integration(self):
|
| 296 |
+
"""Test provider works with factory."""
|
| 297 |
+
factory = TTSProviderFactory()
|
| 298 |
+
|
| 299 |
+
if 'my_tts' in factory.get_available_providers():
|
| 300 |
+
provider = factory.create_provider('my_tts')
|
| 301 |
+
assert provider.is_available()
|
| 302 |
+
assert len(provider.get_available_voices()) > 0
|
| 303 |
+
|
| 304 |
+
def test_end_to_end_synthesis(self):
|
| 305 |
+
"""Test complete synthesis workflow."""
|
| 306 |
+
container = initialize_global_container()
|
| 307 |
+
factory = container.resolve(TTSProviderFactory)
|
| 308 |
+
|
| 309 |
+
if 'my_tts' in factory.get_available_providers():
|
| 310 |
+
provider = factory.create_provider('my_tts')
|
| 311 |
+
|
| 312 |
+
# Create synthesis request
|
| 313 |
+
text_content = TextContent(text="Integration test", language="en")
|
| 314 |
+
voice_settings = VoiceSettings(voice_id="voice1", speed=1.0)
|
| 315 |
+
request = SpeechSynthesisRequest(
|
| 316 |
+
text_content=text_content,
|
| 317 |
+
voice_settings=voice_settings
|
| 318 |
+
)
|
| 319 |
+
|
| 320 |
+
# Synthesize audio
|
| 321 |
+
result = provider.synthesize(request)
|
| 322 |
+
|
| 323 |
+
assert result.data is not None
|
| 324 |
+
assert result.duration > 0
|
| 325 |
+
assert result.sample_rate > 0
|
| 326 |
+
```
|
| 327 |
+
|
| 328 |
+
## Adding New STT Providers
|
| 329 |
+
|
| 330 |
+
### Step 1: Implement the Provider Class
|
| 331 |
+
|
| 332 |
+
```python
|
| 333 |
+
# src/infrastructure/stt/my_stt_provider.py
|
| 334 |
+
|
| 335 |
+
import logging
|
| 336 |
+
from typing import List
|
| 337 |
+
from ..base.stt_provider_base import STTProviderBase
|
| 338 |
+
from ...domain.models.audio_content import AudioContent
|
| 339 |
+
from ...domain.models.text_content import TextContent
|
| 340 |
+
from ...domain.exceptions import SpeechRecognitionException
|
| 341 |
+
|
| 342 |
+
logger = logging.getLogger(__name__)
|
| 343 |
+
|
| 344 |
+
|
| 345 |
+
class MySTTProvider(STTProviderBase):
|
| 346 |
+
"""Custom STT provider implementation."""
|
| 347 |
+
|
| 348 |
+
def __init__(self, model_path: str = None, **kwargs):
|
| 349 |
+
"""Initialize the STT provider.
|
| 350 |
+
|
| 351 |
+
Args:
|
| 352 |
+
model_path: Path to the STT model
|
| 353 |
+
**kwargs: Additional provider-specific configuration
|
| 354 |
+
"""
|
| 355 |
+
super().__init__(
|
| 356 |
+
provider_name="my_stt",
|
| 357 |
+
supported_languages=["en", "zh", "es", "fr"],
|
| 358 |
+
supported_models=["my_stt_small", "my_stt_large"]
|
| 359 |
+
)
|
| 360 |
+
self.model_path = model_path
|
| 361 |
+
self._initialize_provider()
|
| 362 |
+
|
| 363 |
+
def _initialize_provider(self):
|
| 364 |
+
"""Initialize provider-specific resources."""
|
| 365 |
+
try:
|
| 366 |
+
# Initialize your STT engine/model here
|
| 367 |
+
# Example: self.model = MySTTModel.load(self.model_path)
|
| 368 |
+
pass
|
| 369 |
+
except Exception as e:
|
| 370 |
+
logger.error(f"Failed to initialize {self.provider_name}: {e}")
|
| 371 |
+
raise SpeechRecognitionException(f"Provider initialization failed: {e}")
|
| 372 |
+
|
| 373 |
+
def is_available(self) -> bool:
|
| 374 |
+
"""Check if the provider is available."""
|
| 375 |
+
try:
|
| 376 |
+
# Check dependencies, model availability, etc.
|
| 377 |
+
return True # Replace with actual check
|
| 378 |
+
except Exception:
|
| 379 |
+
return False
|
| 380 |
+
|
| 381 |
+
def get_supported_models(self) -> List[str]:
|
| 382 |
+
"""Get list of supported models."""
|
| 383 |
+
return self.supported_models
|
| 384 |
+
|
| 385 |
+
def _transcribe_audio(self, audio: AudioContent, model: str) -> tuple[str, float, dict]:
|
| 386 |
+
"""Transcribe audio using the specified model.
|
| 387 |
+
|
| 388 |
+
Args:
|
| 389 |
+
audio: Audio content to transcribe
|
| 390 |
+
model: Model identifier to use
|
| 391 |
+
|
| 392 |
+
Returns:
|
| 393 |
+
tuple: (transcribed_text, confidence_score, metadata)
|
| 394 |
+
"""
|
| 395 |
+
try:
|
| 396 |
+
# Implement your STT logic here
|
| 397 |
+
# Example:
|
| 398 |
+
# result = self.model.transcribe(
|
| 399 |
+
# audio_data=audio.data,
|
| 400 |
+
# sample_rate=audio.sample_rate,
|
| 401 |
+
# model=model
|
| 402 |
+
# )
|
| 403 |
+
|
| 404 |
+
# Return transcription results
|
| 405 |
+
text = "Transcribed text" # Replace with actual transcription
|
| 406 |
+
confidence = 0.95 # Replace with actual confidence
|
| 407 |
+
metadata = {
|
| 408 |
+
"model_used": model,
|
| 409 |
+
"processing_time": 1.5,
|
| 410 |
+
"language_detected": "en"
|
| 411 |
+
}
|
| 412 |
+
|
| 413 |
+
return text, confidence, metadata
|
| 414 |
+
|
| 415 |
+
except Exception as e:
|
| 416 |
+
self._handle_provider_error(e, "transcription")
|
| 417 |
+
```
|
| 418 |
+
|
| 419 |
+
### Step 2: Register and Test
|
| 420 |
+
|
| 421 |
+
Follow similar steps as TTS providers for registration, configuration, and testing.
|
| 422 |
+
|
| 423 |
+
## Adding New Translation Providers
|
| 424 |
+
|
| 425 |
+
### Step 1: Implement the Provider Class
|
| 426 |
+
|
| 427 |
+
```python
|
| 428 |
+
# src/infrastructure/translation/my_translation_provider.py
|
| 429 |
+
|
| 430 |
+
import logging
|
| 431 |
+
from typing import List, Dict
|
| 432 |
+
from ..base.translation_provider_base import TranslationProviderBase
|
| 433 |
+
from ...domain.models.translation_request import TranslationRequest
|
| 434 |
+
from ...domain.models.text_content import TextContent
|
| 435 |
+
from ...domain.exceptions import TranslationFailedException
|
| 436 |
+
|
| 437 |
+
logger = logging.getLogger(__name__)
|
| 438 |
+
|
| 439 |
+
|
| 440 |
+
class MyTranslationProvider(TranslationProviderBase):
|
| 441 |
+
"""Custom translation provider implementation."""
|
| 442 |
+
|
| 443 |
+
def __init__(self, api_key: str = None, **kwargs):
|
| 444 |
+
"""Initialize the translation provider."""
|
| 445 |
+
super().__init__(
|
| 446 |
+
provider_name="my_translation",
|
| 447 |
+
supported_languages=["en", "zh", "es", "fr", "de", "ja"]
|
| 448 |
+
)
|
| 449 |
+
self.api_key = api_key
|
| 450 |
+
self._initialize_provider()
|
| 451 |
+
|
| 452 |
+
def _initialize_provider(self):
|
| 453 |
+
"""Initialize provider-specific resources."""
|
| 454 |
+
try:
|
| 455 |
+
# Initialize your translation engine/model here
|
| 456 |
+
pass
|
| 457 |
+
except Exception as e:
|
| 458 |
+
logger.error(f"Failed to initialize {self.provider_name}: {e}")
|
| 459 |
+
raise TranslationFailedException(f"Provider initialization failed: {e}")
|
| 460 |
+
|
| 461 |
+
def is_available(self) -> bool:
|
| 462 |
+
"""Check if the provider is available."""
|
| 463 |
+
try:
|
| 464 |
+
# Check dependencies, API connectivity, etc.
|
| 465 |
+
return True # Replace with actual check
|
| 466 |
+
except Exception:
|
| 467 |
+
return False
|
| 468 |
+
|
| 469 |
+
def get_supported_language_pairs(self) -> List[tuple[str, str]]:
|
| 470 |
+
"""Get supported language pairs."""
|
| 471 |
+
# Return list of (source_lang, target_lang) tuples
|
| 472 |
+
pairs = []
|
| 473 |
+
for source in self.supported_languages:
|
| 474 |
+
for target in self.supported_languages:
|
| 475 |
+
if source != target:
|
| 476 |
+
pairs.append((source, target))
|
| 477 |
+
return pairs
|
| 478 |
+
|
| 479 |
+
def _translate_text(self, request: TranslationRequest) -> tuple[str, float, dict]:
|
| 480 |
+
"""Translate text using the provider.
|
| 481 |
+
|
| 482 |
+
Args:
|
| 483 |
+
request: Translation request
|
| 484 |
+
|
| 485 |
+
Returns:
|
| 486 |
+
tuple: (translated_text, confidence_score, metadata)
|
| 487 |
+
"""
|
| 488 |
+
try:
|
| 489 |
+
source_text = request.text_content.text
|
| 490 |
+
source_lang = request.source_language or request.text_content.language
|
| 491 |
+
target_lang = request.target_language
|
| 492 |
+
|
| 493 |
+
# Implement your translation logic here
|
| 494 |
+
# Example:
|
| 495 |
+
# result = self.translator.translate(
|
| 496 |
+
# text=source_text,
|
| 497 |
+
# source_lang=source_lang,
|
| 498 |
+
# target_lang=target_lang
|
| 499 |
+
# )
|
| 500 |
+
|
| 501 |
+
# Return translation results
|
| 502 |
+
translated_text = f"Translated: {source_text}" # Replace with actual translation
|
| 503 |
+
confidence = 0.92 # Replace with actual confidence
|
| 504 |
+
metadata = {
|
| 505 |
+
"source_language_detected": source_lang,
|
| 506 |
+
"target_language": target_lang,
|
| 507 |
+
"processing_time": 0.5,
|
| 508 |
+
"model_used": "my_translation_model"
|
| 509 |
+
}
|
| 510 |
+
|
| 511 |
+
return translated_text, confidence, metadata
|
| 512 |
+
|
| 513 |
+
except Exception as e:
|
| 514 |
+
self._handle_provider_error(e, "translation")
|
| 515 |
+
```
|
| 516 |
+
|
| 517 |
+
## Testing Guidelines
|
| 518 |
+
|
| 519 |
+
### Unit Testing
|
| 520 |
+
|
| 521 |
+
- Test each provider in isolation using mocks
|
| 522 |
+
- Cover success and failure scenarios
|
| 523 |
+
- Test edge cases (empty input, invalid parameters)
|
| 524 |
+
- Verify error handling and exception propagation
|
| 525 |
+
|
| 526 |
+
### Integration Testing
|
| 527 |
+
|
| 528 |
+
- Test provider integration with factories
|
| 529 |
+
- Test complete pipeline workflows
|
| 530 |
+
- Test fallback mechanisms
|
| 531 |
+
- Test with real external services (when available)
|
| 532 |
+
|
| 533 |
+
### Performance Testing
|
| 534 |
+
|
| 535 |
+
- Measure processing times for different input sizes
|
| 536 |
+
- Test memory usage and resource cleanup
|
| 537 |
+
- Test concurrent processing capabilities
|
| 538 |
+
- Benchmark against existing providers
|
| 539 |
+
|
| 540 |
+
### Test Structure
|
| 541 |
+
|
| 542 |
+
```
|
| 543 |
+
tests/
|
| 544 |
+
├── unit/
|
| 545 |
+
│ ├── domain/
|
| 546 |
+
│ ├── application/
|
| 547 |
+
│ └── infrastructure/
|
| 548 |
+
│ ├── tts/
|
| 549 |
+
│ ├── stt/
|
| 550 |
+
│ └── translation/
|
| 551 |
+
├── integration/
|
| 552 |
+
│ ├── test_complete_pipeline.py
|
| 553 |
+
│ ├── test_provider_fallback.py
|
| 554 |
+
│ └── test_error_recovery.py
|
| 555 |
+
└── performance/
|
| 556 |
+
├── test_processing_speed.py
|
| 557 |
+
├── test_memory_usage.py
|
| 558 |
+
└── test_concurrent_processing.py
|
| 559 |
+
```
|
| 560 |
+
|
| 561 |
+
## Code Style and Standards
|
| 562 |
+
|
| 563 |
+
### Python Style Guide
|
| 564 |
+
|
| 565 |
+
- Follow PEP 8 for code formatting
|
| 566 |
+
- Use type hints for all public methods
|
| 567 |
+
- Write comprehensive docstrings (Google style)
|
| 568 |
+
- Use meaningful variable and function names
|
| 569 |
+
- Keep functions focused and small (< 50 lines)
|
| 570 |
+
|
| 571 |
+
### Documentation Standards
|
| 572 |
+
|
| 573 |
+
- Document all public interfaces
|
| 574 |
+
- Include usage examples in docstrings
|
| 575 |
+
- Explain complex algorithms and business logic
|
| 576 |
+
- Keep documentation up-to-date with code changes
|
| 577 |
+
|
| 578 |
+
### Error Handling
|
| 579 |
+
|
| 580 |
+
- Use domain-specific exceptions
|
| 581 |
+
- Provide detailed error messages
|
| 582 |
+
- Log errors with appropriate levels
|
| 583 |
+
- Implement graceful degradation where possible
|
| 584 |
+
|
| 585 |
+
### Logging
|
| 586 |
+
|
| 587 |
+
```python
|
| 588 |
+
import logging
|
| 589 |
+
|
| 590 |
+
logger = logging.getLogger(__name__)
|
| 591 |
+
|
| 592 |
+
# Use appropriate log levels
|
| 593 |
+
logger.debug("Detailed debugging information")
|
| 594 |
+
logger.info("General information about program execution")
|
| 595 |
+
logger.warning("Something unexpected happened")
|
| 596 |
+
logger.error("A serious error occurred")
|
| 597 |
+
logger.critical("A very serious error occurred")
|
| 598 |
+
```
|
| 599 |
+
|
| 600 |
+
## Debugging and Troubleshooting
|
| 601 |
+
|
| 602 |
+
### Common Issues
|
| 603 |
+
|
| 604 |
+
1. **Provider Not Available**
|
| 605 |
+
- Check dependencies are installed
|
| 606 |
+
- Verify configuration settings
|
| 607 |
+
- Check logs for initialization errors
|
| 608 |
+
|
| 609 |
+
2. **Poor Quality Output**
|
| 610 |
+
- Verify input audio quality
|
| 611 |
+
- Check model parameters
|
| 612 |
+
- Review provider-specific settings
|
| 613 |
+
|
| 614 |
+
3. **Performance Issues**
|
| 615 |
+
- Profile code execution
|
| 616 |
+
- Check memory usage
|
| 617 |
+
- Optimize audio processing pipeline
|
| 618 |
+
|
| 619 |
+
### Debugging Tools
|
| 620 |
+
|
| 621 |
+
- Use Python debugger (pdb) for step-through debugging
|
| 622 |
+
- Enable detailed logging for troubleshooting
|
| 623 |
+
- Use profiling tools (cProfile, memory_profiler)
|
| 624 |
+
- Monitor system resources during processing
|
| 625 |
+
|
| 626 |
+
### Logging Configuration
|
| 627 |
+
|
| 628 |
+
```python
|
| 629 |
+
# Enable debug logging for development
|
| 630 |
+
import logging
|
| 631 |
+
logging.basicConfig(
|
| 632 |
+
level=logging.DEBUG,
|
| 633 |
+
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
|
| 634 |
+
handlers=[
|
| 635 |
+
logging.FileHandler("debug.log"),
|
| 636 |
+
logging.StreamHandler()
|
| 637 |
+
]
|
| 638 |
+
)
|
| 639 |
+
```
|
| 640 |
+
|
| 641 |
+
## Performance Considerations
|
| 642 |
+
|
| 643 |
+
### Optimization Strategies
|
| 644 |
+
|
| 645 |
+
1. **Audio Processing**
|
| 646 |
+
- Use appropriate sample rates
|
| 647 |
+
- Implement streaming where possible
|
| 648 |
+
- Cache processed results
|
| 649 |
+
- Optimize memory usage
|
| 650 |
+
|
| 651 |
+
2. **Model Loading**
|
| 652 |
+
- Load models once and reuse
|
| 653 |
+
- Use lazy loading for optional providers
|
| 654 |
+
- Implement model caching strategies
|
| 655 |
+
|
| 656 |
+
3. **Concurrent Processing**
|
| 657 |
+
- Use async/await for I/O operations
|
| 658 |
+
- Implement thread-safe providers
|
| 659 |
+
- Consider multiprocessing for CPU-intensive tasks
|
| 660 |
+
|
| 661 |
+
### Memory Management
|
| 662 |
+
|
| 663 |
+
- Clean up temporary files
|
| 664 |
+
- Release model resources when not needed
|
| 665 |
+
- Monitor memory usage in long-running processes
|
| 666 |
+
- Implement resource pooling for expensive operations
|
| 667 |
+
|
| 668 |
+
### Monitoring and Metrics
|
| 669 |
+
|
| 670 |
+
- Track processing times
|
| 671 |
+
- Monitor error rates
|
| 672 |
+
- Measure resource utilization
|
| 673 |
+
- Implement health checks
|
| 674 |
+
|
| 675 |
+
## Contributing Guidelines
|
| 676 |
+
|
| 677 |
+
### Development Workflow
|
| 678 |
+
|
| 679 |
+
1. Fork the repository
|
| 680 |
+
2. Create a feature branch
|
| 681 |
+
3. Implement changes with tests
|
| 682 |
+
4. Run the full test suite
|
| 683 |
+
5. Submit a pull request
|
| 684 |
+
|
| 685 |
+
### Code Review Process
|
| 686 |
+
|
| 687 |
+
- All changes require code review
|
| 688 |
+
- Tests must pass before merging
|
| 689 |
+
- Documentation must be updated
|
| 690 |
+
- Performance impact should be assessed
|
| 691 |
+
|
| 692 |
+
### Release Process
|
| 693 |
+
|
| 694 |
+
- Follow semantic versioning
|
| 695 |
+
- Update changelog
|
| 696 |
+
- Tag releases appropriately
|
| 697 |
+
- Deploy to staging before production
|
| 698 |
+
|
| 699 |
+
---
|
| 700 |
+
|
| 701 |
+
For questions or support, please refer to the project documentation or open an issue in the repository.
|
README.md
CHANGED
|
@@ -1,89 +1,281 @@
|
|
| 1 |
-
|
| 2 |
-
title: TeachingAssistant
|
| 3 |
-
emoji: 🚀
|
| 4 |
-
colorFrom: gray
|
| 5 |
-
colorTo: blue
|
| 6 |
-
sdk: streamlit
|
| 7 |
-
sdk_version: 1.44.1
|
| 8 |
-
app_file: app.py
|
| 9 |
-
pinned: false
|
| 10 |
-
---
|
| 11 |
|
| 12 |
-
|
| 13 |
|
| 14 |
-
|
| 15 |
|
| 16 |
-
|
| 17 |
|
| 18 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
|
| 20 |
-
|
| 21 |
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
|
| 27 |
-
|
| 28 |
-
- NVIDIA's Parakeet-TDT-0.6B model
|
| 29 |
-
- Optimized for real-time transcription
|
| 30 |
-
- Requires additional installation (see below)
|
| 31 |
|
| 32 |
-
|
| 33 |
|
| 34 |
-
|
|
|
|
|
|
|
| 35 |
|
| 36 |
-
|
| 37 |
|
| 38 |
-
|
| 39 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 40 |
```
|
| 41 |
|
| 42 |
-
|
|
|
|
|
|
|
| 43 |
|
| 44 |
```bash
|
| 45 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 46 |
```
|
| 47 |
|
| 48 |
-
|
|
|
|
|
|
|
|
|
|
| 49 |
|
| 50 |
-
|
| 51 |
|
| 52 |
-
|
| 53 |
|
| 54 |
-
|
| 55 |
|
| 56 |
-
```
|
| 57 |
-
|
|
|
|
|
|
|
|
|
|
| 58 |
|
| 59 |
-
#
|
| 60 |
-
|
|
|
|
| 61 |
|
| 62 |
-
#
|
| 63 |
-
|
|
|
|
| 64 |
```
|
| 65 |
|
| 66 |
-
###
|
| 67 |
|
| 68 |
-
|
| 69 |
|
| 70 |
```python
|
| 71 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 72 |
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
|
|
|
| 76 |
|
| 77 |
-
|
| 78 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 79 |
```
|
| 80 |
|
| 81 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 82 |
|
| 83 |
-
|
| 84 |
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 88 |
|
| 89 |
-
|
|
|
|
| 1 |
+
# Audio Translation System
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
|
| 3 |
+
A high-quality audio translation system built using Domain-Driven Design (DDD) principles. The application processes audio through a pipeline: Speech-to-Text (STT) → Translation → Text-to-Speech (TTS), supporting multiple providers for each service with automatic fallback mechanisms.
|
| 4 |
|
| 5 |
+
## 🏗️ Architecture Overview
|
| 6 |
|
| 7 |
+
The application follows a clean DDD architecture with clear separation of concerns:
|
| 8 |
|
| 9 |
+
```
|
| 10 |
+
src/
|
| 11 |
+
├── domain/ # 🧠 Business logic and rules
|
| 12 |
+
│ ├── models/ # Domain entities and value objects
|
| 13 |
+
│ ├── services/ # Domain services
|
| 14 |
+
│ ├── interfaces/ # Domain interfaces (ports)
|
| 15 |
+
│ └── exceptions.py # Domain-specific exceptions
|
| 16 |
+
├── application/ # 🎯 Use case orchestration
|
| 17 |
+
│ ├── services/ # Application services
|
| 18 |
+
│ ├── dtos/ # Data transfer objects
|
| 19 |
+
│ └── error_handling/ # Application error handling
|
| 20 |
+
├── infrastructure/ # 🔧 External concerns
|
| 21 |
+
│ ├── tts/ # TTS provider implementations
|
| 22 |
+
│ ├── stt/ # STT provider implementations
|
| 23 |
+
│ ├── translation/ # Translation service implementations
|
| 24 |
+
│ ├── base/ # Provider base classes
|
| 25 |
+
│ └── config/ # Configuration and DI container
|
| 26 |
+
└── presentation/ # 🖥️ UI layer
|
| 27 |
+
└── (Streamlit app in app.py)
|
| 28 |
+
```
|
| 29 |
|
| 30 |
+
### 🔄 Data Flow
|
| 31 |
|
| 32 |
+
```mermaid
|
| 33 |
+
graph TD
|
| 34 |
+
A[User Upload] --> B[Presentation Layer]
|
| 35 |
+
B --> C[Application Service]
|
| 36 |
+
C --> D[Domain Service]
|
| 37 |
+
D --> E[STT Provider]
|
| 38 |
+
D --> F[Translation Provider]
|
| 39 |
+
D --> G[TTS Provider]
|
| 40 |
+
E --> H[Infrastructure Layer]
|
| 41 |
+
F --> H
|
| 42 |
+
G --> H
|
| 43 |
+
H --> I[External Services]
|
| 44 |
+
```
|
| 45 |
|
| 46 |
+
## 🚀 Quick Start
|
|
|
|
|
|
|
|
|
|
| 47 |
|
| 48 |
+
### Prerequisites
|
| 49 |
|
| 50 |
+
- Python 3.9+
|
| 51 |
+
- FFmpeg (for audio processing)
|
| 52 |
+
- Optional: CUDA for GPU acceleration
|
| 53 |
|
| 54 |
+
### Installation
|
| 55 |
|
| 56 |
+
1. **Clone the repository:**
|
| 57 |
+
```bash
|
| 58 |
+
git clone <repository-url>
|
| 59 |
+
cd audio-translation-system
|
| 60 |
+
```
|
| 61 |
+
|
| 62 |
+
2. **Install dependencies:**
|
| 63 |
+
```bash
|
| 64 |
+
pip install -r requirements.txt
|
| 65 |
+
```
|
| 66 |
+
|
| 67 |
+
3. **Run the application:**
|
| 68 |
+
```bash
|
| 69 |
+
streamlit run app.py
|
| 70 |
+
```
|
| 71 |
+
|
| 72 |
+
4. **Access the web interface:**
|
| 73 |
+
Open your browser to `http://localhost:8501`
|
| 74 |
+
|
| 75 |
+
## 🎛️ Supported Providers
|
| 76 |
+
|
| 77 |
+
### Speech-to-Text (STT)
|
| 78 |
+
- **Whisper** (Default) - OpenAI's Whisper Large-v3 model
|
| 79 |
+
- **Parakeet** - NVIDIA's Parakeet-TDT-0.6B model (requires NeMo Toolkit)
|
| 80 |
+
|
| 81 |
+
### Translation
|
| 82 |
+
- **NLLB** - Meta's No Language Left Behind model
|
| 83 |
+
|
| 84 |
+
### Text-to-Speech (TTS)
|
| 85 |
+
- **Kokoro** - High-quality neural TTS
|
| 86 |
+
- **Dia** - Fast neural TTS
|
| 87 |
+
- **CosyVoice2** - Advanced voice synthesis
|
| 88 |
+
- **Dummy** - Test provider for development
|
| 89 |
+
|
| 90 |
+
## 📖 Usage
|
| 91 |
+
|
| 92 |
+
### Web Interface
|
| 93 |
+
|
| 94 |
+
1. **Upload Audio**: Support for WAV, MP3, FLAC, OGG formats
|
| 95 |
+
2. **Select Model**: Choose STT model (Whisper/Parakeet)
|
| 96 |
+
3. **Choose Language**: Select target translation language
|
| 97 |
+
4. **Pick Voice**: Select TTS voice and speed
|
| 98 |
+
5. **Process**: Click to start the translation pipeline
|
| 99 |
+
|
| 100 |
+
### Programmatic Usage
|
| 101 |
+
|
| 102 |
+
```python
|
| 103 |
+
from src.infrastructure.config.container_setup import initialize_global_container
|
| 104 |
+
from src.application.services.audio_processing_service import AudioProcessingApplicationService
|
| 105 |
+
from src.application.dtos.processing_request_dto import ProcessingRequestDto
|
| 106 |
+
from src.application.dtos.audio_upload_dto import AudioUploadDto
|
| 107 |
+
|
| 108 |
+
# Initialize dependency container
|
| 109 |
+
container = initialize_global_container()
|
| 110 |
+
audio_service = container.resolve(AudioProcessingApplicationService)
|
| 111 |
+
|
| 112 |
+
# Create request
|
| 113 |
+
with open("audio.wav", "rb") as f:
|
| 114 |
+
audio_upload = AudioUploadDto(
|
| 115 |
+
filename="audio.wav",
|
| 116 |
+
content=f.read(),
|
| 117 |
+
content_type="audio/wav",
|
| 118 |
+
size=os.path.getsize("audio.wav")
|
| 119 |
+
)
|
| 120 |
+
|
| 121 |
+
request = ProcessingRequestDto(
|
| 122 |
+
audio=audio_upload,
|
| 123 |
+
asr_model="whisper-small",
|
| 124 |
+
target_language="zh",
|
| 125 |
+
voice="kokoro",
|
| 126 |
+
speed=1.0
|
| 127 |
+
)
|
| 128 |
+
|
| 129 |
+
# Process audio
|
| 130 |
+
result = audio_service.process_audio_pipeline(request)
|
| 131 |
+
|
| 132 |
+
if result.success:
|
| 133 |
+
print(f"Original: {result.original_text}")
|
| 134 |
+
print(f"Translated: {result.translated_text}")
|
| 135 |
+
print(f"Audio saved to: {result.audio_path}")
|
| 136 |
+
else:
|
| 137 |
+
print(f"Error: {result.error_message}")
|
| 138 |
```
|
| 139 |
|
| 140 |
+
## 🧪 Testing
|
| 141 |
+
|
| 142 |
+
The project includes comprehensive test coverage:
|
| 143 |
|
| 144 |
```bash
|
| 145 |
+
# Run all tests
|
| 146 |
+
python -m pytest
|
| 147 |
+
|
| 148 |
+
# Run specific test categories
|
| 149 |
+
python -m pytest tests/unit/ # Unit tests
|
| 150 |
+
python -m pytest tests/integration/ # Integration tests
|
| 151 |
+
|
| 152 |
+
# Run with coverage
|
| 153 |
+
python -m pytest --cov=src --cov-report=html
|
| 154 |
```
|
| 155 |
|
| 156 |
+
### Test Structure
|
| 157 |
+
- **Unit Tests**: Test individual components in isolation
|
| 158 |
+
- **Integration Tests**: Test provider integrations and complete pipeline
|
| 159 |
+
- **Mocking**: Uses dependency injection for easy mocking
|
| 160 |
|
| 161 |
+
## 🔧 Configuration
|
| 162 |
|
| 163 |
+
### Environment Variables
|
| 164 |
|
| 165 |
+
Create a `.env` file or set environment variables:
|
| 166 |
|
| 167 |
+
```bash
|
| 168 |
+
# Provider preferences (comma-separated, in order of preference)
|
| 169 |
+
TTS_PROVIDERS=kokoro,dia,cosyvoice2,dummy
|
| 170 |
+
STT_PROVIDERS=whisper,parakeet
|
| 171 |
+
TRANSLATION_PROVIDERS=nllb
|
| 172 |
|
| 173 |
+
# Logging
|
| 174 |
+
LOG_LEVEL=INFO
|
| 175 |
+
LOG_FILE=app.log
|
| 176 |
|
| 177 |
+
# Performance
|
| 178 |
+
MAX_FILE_SIZE_MB=100
|
| 179 |
+
TEMP_FILE_CLEANUP_HOURS=24
|
| 180 |
```
|
| 181 |
|
| 182 |
+
### Provider Configuration
|
| 183 |
|
| 184 |
+
The system automatically detects available providers and falls back gracefully:
|
| 185 |
|
| 186 |
```python
|
| 187 |
+
# Example: Custom provider configuration
|
| 188 |
+
from src.infrastructure.config.dependency_container import DependencyContainer
|
| 189 |
+
|
| 190 |
+
container = DependencyContainer()
|
| 191 |
+
container.configure_tts_providers(['kokoro', 'dummy']) # Preferred order
|
| 192 |
+
```
|
| 193 |
+
|
| 194 |
+
## 🏗️ Architecture Benefits
|
| 195 |
+
|
| 196 |
+
### 🎯 Domain-Driven Design
|
| 197 |
+
- **Clear Business Logic**: Domain layer contains pure business rules
|
| 198 |
+
- **Separation of Concerns**: Each layer has distinct responsibilities
|
| 199 |
+
- **Testability**: Easy to test business logic independently
|
| 200 |
|
| 201 |
+
### 🔌 Dependency Injection
|
| 202 |
+
- **Loose Coupling**: Components depend on abstractions, not implementations
|
| 203 |
+
- **Easy Testing**: Mock dependencies for unit testing
|
| 204 |
+
- **Flexibility**: Swap implementations without changing business logic
|
| 205 |
|
| 206 |
+
### 🛡️ Error Handling
|
| 207 |
+
- **Layered Exceptions**: Domain exceptions mapped to user-friendly messages
|
| 208 |
+
- **Graceful Fallbacks**: Automatic provider fallback on failures
|
| 209 |
+
- **Structured Logging**: Correlation IDs and detailed error tracking
|
| 210 |
+
|
| 211 |
+
### 📈 Extensibility
|
| 212 |
+
- **Plugin Architecture**: Add new providers by implementing interfaces
|
| 213 |
+
- **Configuration-Driven**: Change behavior through configuration
|
| 214 |
+
- **Provider Factories**: Automatic provider discovery and instantiation
|
| 215 |
+
|
| 216 |
+
## 🔍 Troubleshooting
|
| 217 |
+
|
| 218 |
+
### Common Issues
|
| 219 |
+
|
| 220 |
+
**Import Errors:**
|
| 221 |
+
```bash
|
| 222 |
+
# Ensure all dependencies are installed
|
| 223 |
+
pip install -r requirements.txt
|
| 224 |
+
|
| 225 |
+
# For Parakeet support
|
| 226 |
+
pip install 'nemo_toolkit[asr]'
|
| 227 |
```
|
| 228 |
|
| 229 |
+
**Audio Processing Errors:**
|
| 230 |
+
- Verify FFmpeg is installed and in PATH
|
| 231 |
+
- Check audio file format is supported
|
| 232 |
+
- Ensure sufficient disk space for temporary files
|
| 233 |
+
|
| 234 |
+
**Provider Unavailable:**
|
| 235 |
+
- Check provider-specific dependencies
|
| 236 |
+
- Review logs for detailed error messages
|
| 237 |
+
- Verify provider configuration
|
| 238 |
|
| 239 |
+
### Debug Mode
|
| 240 |
|
| 241 |
+
Enable detailed logging:
|
| 242 |
+
```python
|
| 243 |
+
import logging
|
| 244 |
+
logging.basicConfig(level=logging.DEBUG)
|
| 245 |
+
```
|
| 246 |
+
|
| 247 |
+
## 🤝 Contributing
|
| 248 |
+
|
| 249 |
+
### Adding New Providers
|
| 250 |
+
|
| 251 |
+
See [DEVELOPER_GUIDE.md](DEVELOPER_GUIDE.md) for detailed instructions on:
|
| 252 |
+
- Implementing new TTS providers
|
| 253 |
+
- Adding STT models
|
| 254 |
+
- Extending translation services
|
| 255 |
+
- Writing tests
|
| 256 |
+
|
| 257 |
+
### Development Setup
|
| 258 |
+
|
| 259 |
+
1. **Install development dependencies:**
|
| 260 |
+
```bash
|
| 261 |
+
pip install -r requirements-dev.txt
|
| 262 |
+
```
|
| 263 |
+
|
| 264 |
+
2. **Run pre-commit hooks:**
|
| 265 |
+
```bash
|
| 266 |
+
pre-commit install
|
| 267 |
+
pre-commit run --all-files
|
| 268 |
+
```
|
| 269 |
+
|
| 270 |
+
3. **Run tests before committing:**
|
| 271 |
+
```bash
|
| 272 |
+
python -m pytest tests/
|
| 273 |
+
```
|
| 274 |
+
|
| 275 |
+
## 📄 License
|
| 276 |
+
|
| 277 |
+
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
|
| 278 |
+
|
| 279 |
+
---
|
| 280 |
|
| 281 |
+
For detailed developer documentation, see [DEVELOPER_GUIDE.md](DEVELOPER_GUIDE.md)
|
pyproject.toml
CHANGED
|
@@ -1,60 +1,63 @@
|
|
| 1 |
-
[
|
| 2 |
name = "audio-translator"
|
| 3 |
version = "0.1.0"
|
| 4 |
description = "High-quality audio translation web application"
|
| 5 |
-
authors = [
|
| 6 |
-
|
|
|
|
|
|
|
| 7 |
readme = "README.md"
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
kokoro
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
|
|
|
|
|
|
| 36 |
|
| 37 |
[build-system]
|
| 38 |
-
requires = ["
|
| 39 |
-
build-backend = "
|
| 40 |
-
|
| 41 |
-
[
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
[
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
[
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
start = "app:main"
|
|
|
|
| 1 |
+
[project]
|
| 2 |
name = "audio-translator"
|
| 3 |
version = "0.1.0"
|
| 4 |
description = "High-quality audio translation web application"
|
| 5 |
+
authors = [
|
| 6 |
+
{name = "Your Name", email = "your.email@example.com"}
|
| 7 |
+
]
|
| 8 |
+
license = {text = "MIT"}
|
| 9 |
readme = "README.md"
|
| 10 |
+
requires-python = ">=3.10"
|
| 11 |
+
dependencies = [
|
| 12 |
+
"streamlit>=1.44.1",
|
| 13 |
+
"gradio==5.9.1",
|
| 14 |
+
"nltk>=3.8",
|
| 15 |
+
"librosa>=0.10",
|
| 16 |
+
"ffmpeg-python>=0.2",
|
| 17 |
+
"transformers[audio]>=4.33",
|
| 18 |
+
"torch>=2.1.0",
|
| 19 |
+
"torchaudio>=2.1.0",
|
| 20 |
+
"scipy>=1.11",
|
| 21 |
+
"munch>=2.5",
|
| 22 |
+
"accelerate>=1.2.0",
|
| 23 |
+
"soundfile>=0.13.0",
|
| 24 |
+
"kokoro>=0.7.9",
|
| 25 |
+
"ordered-set>=4.1.0",
|
| 26 |
+
"phonemizer-fork>=3.3.2",
|
| 27 |
+
"nemo_toolkit[asr]",
|
| 28 |
+
"faster-whisper>=1.1.1",
|
| 29 |
+
]
|
| 30 |
+
|
| 31 |
+
[project.optional-dependencies]
|
| 32 |
+
dev = [
|
| 33 |
+
"pytest>=7.0",
|
| 34 |
+
"pytest-cov>=4.0",
|
| 35 |
+
"pytest-mock>=3.10",
|
| 36 |
+
"black>=23.0",
|
| 37 |
+
"flake8>=6.0",
|
| 38 |
+
"mypy>=1.0",
|
| 39 |
+
]
|
| 40 |
|
| 41 |
[build-system]
|
| 42 |
+
requires = ["hatchling"]
|
| 43 |
+
build-backend = "hatchling.build"
|
| 44 |
+
|
| 45 |
+
[tool.hatch.build.targets.wheel]
|
| 46 |
+
packages = ["src"]
|
| 47 |
+
|
| 48 |
+
[tool.pytest.ini_options]
|
| 49 |
+
testpaths = ["tests"]
|
| 50 |
+
python_files = ["test_*.py", "*_test.py"]
|
| 51 |
+
python_classes = ["Test*"]
|
| 52 |
+
python_functions = ["test_*"]
|
| 53 |
+
addopts = "-v --tb=short"
|
| 54 |
+
|
| 55 |
+
[tool.black]
|
| 56 |
+
line-length = 100
|
| 57 |
+
target-version = ['py39']
|
| 58 |
+
|
| 59 |
+
[tool.mypy]
|
| 60 |
+
python_version = "3.9"
|
| 61 |
+
warn_return_any = true
|
| 62 |
+
warn_unused_configs = true
|
| 63 |
+
disallow_untyped_defs = true
|
|
|
src/domain/interfaces/audio_processing.py
CHANGED
|
@@ -1,4 +1,27 @@
|
|
| 1 |
-
"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
|
| 3 |
from abc import ABC, abstractmethod
|
| 4 |
from typing import TYPE_CHECKING
|
|
@@ -10,27 +33,104 @@ if TYPE_CHECKING:
|
|
| 10 |
|
| 11 |
|
| 12 |
class IAudioProcessingService(ABC):
|
| 13 |
-
"""
|
| 14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
@abstractmethod
|
| 16 |
def process_audio_pipeline(
|
| 17 |
-
self,
|
| 18 |
-
audio: 'AudioContent',
|
| 19 |
-
target_language: str,
|
| 20 |
voice_settings: 'VoiceSettings'
|
| 21 |
) -> 'ProcessingResult':
|
| 22 |
"""
|
| 23 |
Process audio through the complete pipeline: STT -> Translation -> TTS.
|
| 24 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
Args:
|
| 26 |
-
audio: The input audio content
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
|
|
|
|
|
|
|
|
|
| 30 |
Returns:
|
| 31 |
-
ProcessingResult:
|
| 32 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
Raises:
|
| 34 |
-
AudioProcessingException: If any step in the pipeline fails
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 35 |
"""
|
| 36 |
pass
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Audio processing service interface.
|
| 3 |
+
|
| 4 |
+
This module defines the core interface for audio processing pipeline orchestration.
|
| 5 |
+
The interface follows Domain-Driven Design principles, providing a clean contract
|
| 6 |
+
for the complete audio translation workflow.
|
| 7 |
+
|
| 8 |
+
Example:
|
| 9 |
+
```python
|
| 10 |
+
from src.domain.interfaces.audio_processing import IAudioProcessingService
|
| 11 |
+
from src.domain.models.audio_content import AudioContent
|
| 12 |
+
from src.domain.models.voice_settings import VoiceSettings
|
| 13 |
+
|
| 14 |
+
# Get service implementation from DI container
|
| 15 |
+
audio_service = container.resolve(IAudioProcessingService)
|
| 16 |
+
|
| 17 |
+
# Process audio through complete pipeline
|
| 18 |
+
result = audio_service.process_audio_pipeline(
|
| 19 |
+
audio=audio_content,
|
| 20 |
+
target_language="zh",
|
| 21 |
+
voice_settings=voice_settings
|
| 22 |
+
)
|
| 23 |
+
```
|
| 24 |
+
"""
|
| 25 |
|
| 26 |
from abc import ABC, abstractmethod
|
| 27 |
from typing import TYPE_CHECKING
|
|
|
|
| 33 |
|
| 34 |
|
| 35 |
class IAudioProcessingService(ABC):
|
| 36 |
+
"""
|
| 37 |
+
Interface for audio processing pipeline orchestration.
|
| 38 |
+
|
| 39 |
+
This interface defines the contract for the complete audio translation pipeline,
|
| 40 |
+
coordinating Speech-to-Text, Translation, and Text-to-Speech services to provide
|
| 41 |
+
end-to-end audio translation functionality.
|
| 42 |
+
|
| 43 |
+
The interface is designed to be:
|
| 44 |
+
- Provider-agnostic: Works with any STT/Translation/TTS implementation
|
| 45 |
+
- Error-resilient: Handles failures gracefully with appropriate exceptions
|
| 46 |
+
- Observable: Provides detailed processing results and metadata
|
| 47 |
+
- Testable: Easy to mock for unit testing
|
| 48 |
+
|
| 49 |
+
Implementations should handle:
|
| 50 |
+
- Provider selection and fallback logic
|
| 51 |
+
- Error handling and recovery
|
| 52 |
+
- Performance monitoring and logging
|
| 53 |
+
- Resource cleanup and management
|
| 54 |
+
"""
|
| 55 |
+
|
| 56 |
@abstractmethod
|
| 57 |
def process_audio_pipeline(
|
| 58 |
+
self,
|
| 59 |
+
audio: 'AudioContent',
|
| 60 |
+
target_language: str,
|
| 61 |
voice_settings: 'VoiceSettings'
|
| 62 |
) -> 'ProcessingResult':
|
| 63 |
"""
|
| 64 |
Process audio through the complete pipeline: STT -> Translation -> TTS.
|
| 65 |
+
|
| 66 |
+
This method orchestrates the complete audio translation workflow:
|
| 67 |
+
1. Speech Recognition: Convert audio to text
|
| 68 |
+
2. Translation: Translate text to target language (if needed)
|
| 69 |
+
3. Speech Synthesis: Convert translated text back to audio
|
| 70 |
+
|
| 71 |
+
The implementation should:
|
| 72 |
+
- Validate input parameters
|
| 73 |
+
- Handle provider failures with fallback mechanisms
|
| 74 |
+
- Provide detailed error information on failure
|
| 75 |
+
- Clean up temporary resources
|
| 76 |
+
- Log processing steps for observability
|
| 77 |
+
|
| 78 |
Args:
|
| 79 |
+
audio: The input audio content to process. Must be a valid AudioContent
|
| 80 |
+
instance with supported format and reasonable duration.
|
| 81 |
+
target_language: The target language code for translation (e.g., 'zh', 'es', 'fr').
|
| 82 |
+
Must be supported by the translation provider.
|
| 83 |
+
voice_settings: Voice configuration for TTS synthesis including voice ID,
|
| 84 |
+
speed, and language preferences.
|
| 85 |
+
|
| 86 |
Returns:
|
| 87 |
+
ProcessingResult: Comprehensive result containing:
|
| 88 |
+
- success: Boolean indicating overall success
|
| 89 |
+
- original_text: Transcribed text from STT (if successful)
|
| 90 |
+
- translated_text: Translated text (if translation was performed)
|
| 91 |
+
- audio_output: Generated audio content (if TTS was successful)
|
| 92 |
+
- processing_time: Total processing duration in seconds
|
| 93 |
+
- error_message: Detailed error description (if failed)
|
| 94 |
+
- metadata: Additional processing information and metrics
|
| 95 |
+
|
| 96 |
Raises:
|
| 97 |
+
AudioProcessingException: If any step in the pipeline fails and cannot
|
| 98 |
+
be recovered through fallback mechanisms.
|
| 99 |
+
ValueError: If input parameters are invalid or unsupported.
|
| 100 |
+
|
| 101 |
+
Example:
|
| 102 |
+
```python
|
| 103 |
+
# Create audio content from file
|
| 104 |
+
with open("input.wav", "rb") as f:
|
| 105 |
+
audio = AudioContent(
|
| 106 |
+
data=f.read(),
|
| 107 |
+
format="wav",
|
| 108 |
+
sample_rate=16000,
|
| 109 |
+
duration=10.5
|
| 110 |
+
)
|
| 111 |
+
|
| 112 |
+
# Configure voice settings
|
| 113 |
+
voice_settings = VoiceSettings(
|
| 114 |
+
voice_id="kokoro",
|
| 115 |
+
speed=1.0,
|
| 116 |
+
language="zh"
|
| 117 |
+
)
|
| 118 |
+
|
| 119 |
+
# Process through pipeline
|
| 120 |
+
result = service.process_audio_pipeline(
|
| 121 |
+
audio=audio,
|
| 122 |
+
target_language="zh",
|
| 123 |
+
voice_settings=voice_settings
|
| 124 |
+
)
|
| 125 |
+
|
| 126 |
+
if result.success:
|
| 127 |
+
print(f"Original: {result.original_text}")
|
| 128 |
+
print(f"Translated: {result.translated_text}")
|
| 129 |
+
# Save output audio
|
| 130 |
+
with open("output.wav", "wb") as f:
|
| 131 |
+
f.write(result.audio_output.data)
|
| 132 |
+
else:
|
| 133 |
+
print(f"Processing failed: {result.error_message}")
|
| 134 |
+
```
|
| 135 |
"""
|
| 136 |
pass
|
src/domain/interfaces/speech_recognition.py
CHANGED
|
@@ -1,4 +1,15 @@
|
|
| 1 |
-
"""Speech recognition service interface.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
|
| 3 |
from abc import ABC, abstractmethod
|
| 4 |
from typing import TYPE_CHECKING
|
|
@@ -9,21 +20,95 @@ if TYPE_CHECKING:
|
|
| 9 |
|
| 10 |
|
| 11 |
class ISpeechRecognitionService(ABC):
|
| 12 |
-
"""Interface for speech recognition services.
|
| 13 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
@abstractmethod
|
| 15 |
def transcribe(self, audio: 'AudioContent', model: str) -> 'TextContent':
|
| 16 |
-
"""
|
| 17 |
-
|
| 18 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
Args:
|
| 20 |
-
audio: The audio content to transcribe
|
| 21 |
-
|
| 22 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 23 |
Returns:
|
| 24 |
-
TextContent: The
|
| 25 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
Raises:
|
| 27 |
-
SpeechRecognitionException: If transcription fails
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
"""
|
| 29 |
pass
|
|
|
|
| 1 |
+
"""Speech recognition service interface.
|
| 2 |
+
|
| 3 |
+
This module defines the interface for speech-to-text (STT) services that convert
|
| 4 |
+
audio content into textual representation. The interface supports multiple STT
|
| 5 |
+
models and providers with consistent error handling.
|
| 6 |
+
|
| 7 |
+
The interface is designed to be:
|
| 8 |
+
- Model-agnostic: Works with any STT implementation (Whisper, Parakeet, etc.)
|
| 9 |
+
- Language-aware: Handles multiple languages and dialects
|
| 10 |
+
- Error-resilient: Provides detailed error information for debugging
|
| 11 |
+
- Performance-conscious: Supports both batch and streaming transcription
|
| 12 |
+
"""
|
| 13 |
|
| 14 |
from abc import ABC, abstractmethod
|
| 15 |
from typing import TYPE_CHECKING
|
|
|
|
| 20 |
|
| 21 |
|
| 22 |
class ISpeechRecognitionService(ABC):
|
| 23 |
+
"""Interface for speech recognition services.
|
| 24 |
+
|
| 25 |
+
This interface defines the contract for converting audio content to text
|
| 26 |
+
using various STT models and providers. Implementations should handle
|
| 27 |
+
different audio formats, languages, and quality levels.
|
| 28 |
+
|
| 29 |
+
Example:
|
| 30 |
+
```python
|
| 31 |
+
# Use through dependency injection
|
| 32 |
+
stt_service = container.resolve(ISpeechRecognitionService)
|
| 33 |
+
|
| 34 |
+
# Transcribe audio
|
| 35 |
+
text_result = stt_service.transcribe(
|
| 36 |
+
audio=audio_content,
|
| 37 |
+
model="whisper-large"
|
| 38 |
+
)
|
| 39 |
+
|
| 40 |
+
print(f"Transcribed: {text_result.text}")
|
| 41 |
+
print(f"Language: {text_result.language}")
|
| 42 |
+
print(f"Confidence: {text_result.confidence}")
|
| 43 |
+
```
|
| 44 |
+
"""
|
| 45 |
+
|
| 46 |
@abstractmethod
|
| 47 |
def transcribe(self, audio: 'AudioContent', model: str) -> 'TextContent':
|
| 48 |
+
"""Transcribe audio content to text using specified STT model.
|
| 49 |
+
|
| 50 |
+
Converts audio data into textual representation with language detection
|
| 51 |
+
and confidence scoring. The method should handle various audio formats
|
| 52 |
+
and quality levels gracefully.
|
| 53 |
+
|
| 54 |
+
Implementation considerations:
|
| 55 |
+
- Audio preprocessing (noise reduction, normalization)
|
| 56 |
+
- Language detection and handling
|
| 57 |
+
- Confidence scoring and quality assessment
|
| 58 |
+
- Memory management for large audio files
|
| 59 |
+
- Timeout handling for long audio content
|
| 60 |
+
|
| 61 |
Args:
|
| 62 |
+
audio: The audio content to transcribe. Must contain valid audio data
|
| 63 |
+
in a supported format (WAV, MP3, FLAC, etc.) with appropriate
|
| 64 |
+
sample rate and duration.
|
| 65 |
+
model: The STT model identifier to use for transcription. Examples:
|
| 66 |
+
- "whisper-small": Fast, lower accuracy
|
| 67 |
+
- "whisper-large": Slower, higher accuracy
|
| 68 |
+
- "parakeet": Real-time optimized
|
| 69 |
+
Must be supported by the implementation.
|
| 70 |
+
|
| 71 |
Returns:
|
| 72 |
+
TextContent: The transcription result containing:
|
| 73 |
+
- text: The transcribed text content
|
| 74 |
+
- language: Detected or specified language code
|
| 75 |
+
- confidence: Overall transcription confidence (0.0-1.0)
|
| 76 |
+
- metadata: Additional information like word-level timestamps,
|
| 77 |
+
alternative transcriptions, processing time
|
| 78 |
+
|
| 79 |
Raises:
|
| 80 |
+
SpeechRecognitionException: If transcription fails due to:
|
| 81 |
+
- Unsupported audio format or quality
|
| 82 |
+
- Model loading or inference errors
|
| 83 |
+
- Network issues (for cloud-based models)
|
| 84 |
+
- Insufficient system resources
|
| 85 |
+
ValueError: If input parameters are invalid:
|
| 86 |
+
- Empty or corrupted audio data
|
| 87 |
+
- Unsupported model identifier
|
| 88 |
+
- Invalid audio format specifications
|
| 89 |
+
|
| 90 |
+
Example:
|
| 91 |
+
```python
|
| 92 |
+
# Load audio file
|
| 93 |
+
with open("speech.wav", "rb") as f:
|
| 94 |
+
audio = AudioContent(
|
| 95 |
+
data=f.read(),
|
| 96 |
+
format="wav",
|
| 97 |
+
sample_rate=16000,
|
| 98 |
+
duration=30.0
|
| 99 |
+
)
|
| 100 |
+
|
| 101 |
+
# Transcribe with high-accuracy model
|
| 102 |
+
try:
|
| 103 |
+
result = service.transcribe(audio, "whisper-large")
|
| 104 |
+
|
| 105 |
+
if result.confidence > 0.8:
|
| 106 |
+
print(f"High confidence: {result.text}")
|
| 107 |
+
else:
|
| 108 |
+
print(f"Low confidence: {result.text} ({result.confidence:.2f})")
|
| 109 |
+
|
| 110 |
+
except SpeechRecognitionException as e:
|
| 111 |
+
print(f"Transcription failed: {e}")
|
| 112 |
+
```
|
| 113 |
"""
|
| 114 |
pass
|
src/domain/interfaces/speech_synthesis.py
CHANGED
|
@@ -1,4 +1,15 @@
|
|
| 1 |
-
"""Speech synthesis service interface.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
|
| 3 |
from abc import ABC, abstractmethod
|
| 4 |
from typing import Iterator, TYPE_CHECKING
|
|
@@ -10,36 +21,175 @@ if TYPE_CHECKING:
|
|
| 10 |
|
| 11 |
|
| 12 |
class ISpeechSynthesisService(ABC):
|
| 13 |
-
"""Interface for speech synthesis services.
|
| 14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
@abstractmethod
|
| 16 |
def synthesize(self, request: 'SpeechSynthesisRequest') -> 'AudioContent':
|
| 17 |
-
"""
|
| 18 |
-
|
| 19 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
Args:
|
| 21 |
-
request: The speech synthesis request containing
|
| 22 |
-
|
|
|
|
|
|
|
|
|
|
| 23 |
Returns:
|
| 24 |
-
AudioContent: The synthesized audio
|
| 25 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
Raises:
|
| 27 |
-
SpeechSynthesisException: If synthesis fails
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
"""
|
| 29 |
pass
|
| 30 |
-
|
| 31 |
@abstractmethod
|
| 32 |
def synthesize_stream(self, request: 'SpeechSynthesisRequest') -> Iterator['AudioChunk']:
|
| 33 |
-
"""
|
| 34 |
-
|
| 35 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
Args:
|
| 37 |
-
request: The speech synthesis request containing text and voice settings
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 42 |
Raises:
|
| 43 |
-
SpeechSynthesisException: If synthesis fails
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 44 |
"""
|
| 45 |
pass
|
|
|
|
| 1 |
+
"""Speech synthesis service interface.
|
| 2 |
+
|
| 3 |
+
This module defines the interface for text-to-speech (TTS) services that convert
|
| 4 |
+
textual content into audio. The interface supports both batch and streaming
|
| 5 |
+
synthesis with multiple voice options and quality settings.
|
| 6 |
+
|
| 7 |
+
The interface is designed to be:
|
| 8 |
+
- Voice-flexible: Supports multiple voices and languages
|
| 9 |
+
- Quality-configurable: Allows control over synthesis parameters
|
| 10 |
+
- Streaming-capable: Supports real-time audio generation
|
| 11 |
+
- Provider-agnostic: Works with any TTS implementation
|
| 12 |
+
"""
|
| 13 |
|
| 14 |
from abc import ABC, abstractmethod
|
| 15 |
from typing import Iterator, TYPE_CHECKING
|
|
|
|
| 21 |
|
| 22 |
|
| 23 |
class ISpeechSynthesisService(ABC):
|
| 24 |
+
"""Interface for speech synthesis services.
|
| 25 |
+
|
| 26 |
+
This interface defines the contract for converting text to speech using
|
| 27 |
+
various TTS models and voices. Implementations should support both batch
|
| 28 |
+
processing and streaming synthesis for different use cases.
|
| 29 |
+
|
| 30 |
+
Example:
|
| 31 |
+
```python
|
| 32 |
+
# Use through dependency injection
|
| 33 |
+
tts_service = container.resolve(ISpeechSynthesisService)
|
| 34 |
+
|
| 35 |
+
# Create synthesis request
|
| 36 |
+
request = SpeechSynthesisRequest(
|
| 37 |
+
text_content=text_content,
|
| 38 |
+
voice_settings=voice_settings
|
| 39 |
+
)
|
| 40 |
+
|
| 41 |
+
# Batch synthesis
|
| 42 |
+
audio = tts_service.synthesize(request)
|
| 43 |
+
|
| 44 |
+
# Or streaming synthesis
|
| 45 |
+
for chunk in tts_service.synthesize_stream(request):
|
| 46 |
+
# Process audio chunk in real-time
|
| 47 |
+
play_audio_chunk(chunk)
|
| 48 |
+
```
|
| 49 |
+
"""
|
| 50 |
+
|
| 51 |
@abstractmethod
|
| 52 |
def synthesize(self, request: 'SpeechSynthesisRequest') -> 'AudioContent':
|
| 53 |
+
"""Synthesize speech from text in batch mode.
|
| 54 |
+
|
| 55 |
+
Converts text content to audio using specified voice settings and
|
| 56 |
+
returns the complete audio content. This method is suitable for
|
| 57 |
+
shorter texts or when the complete audio is needed before playback.
|
| 58 |
+
|
| 59 |
+
Implementation considerations:
|
| 60 |
+
- Text preprocessing (SSML support, pronunciation handling)
|
| 61 |
+
- Voice loading and configuration
|
| 62 |
+
- Audio quality optimization
|
| 63 |
+
- Memory management for long texts
|
| 64 |
+
- Error recovery and fallback voices
|
| 65 |
+
|
| 66 |
Args:
|
| 67 |
+
request: The speech synthesis request containing:
|
| 68 |
+
- text_content: Text to synthesize with language information
|
| 69 |
+
- voice_settings: Voice configuration including voice ID, speed,
|
| 70 |
+
pitch, volume, and other voice-specific parameters
|
| 71 |
+
|
| 72 |
Returns:
|
| 73 |
+
AudioContent: The synthesized audio containing:
|
| 74 |
+
- data: Raw audio data in specified format
|
| 75 |
+
- format: Audio format (WAV, MP3, etc.)
|
| 76 |
+
- sample_rate: Audio sample rate in Hz
|
| 77 |
+
- duration: Audio duration in seconds
|
| 78 |
+
- metadata: Additional synthesis information
|
| 79 |
+
|
| 80 |
Raises:
|
| 81 |
+
SpeechSynthesisException: If synthesis fails due to:
|
| 82 |
+
- Unsupported voice or language
|
| 83 |
+
- Text processing errors (invalid characters, length limits)
|
| 84 |
+
- Voice model loading failures
|
| 85 |
+
- Insufficient system resources
|
| 86 |
+
ValueError: If request parameters are invalid:
|
| 87 |
+
- Empty text content
|
| 88 |
+
- Unsupported voice settings
|
| 89 |
+
- Invalid audio format specifications
|
| 90 |
+
|
| 91 |
+
Example:
|
| 92 |
+
```python
|
| 93 |
+
# Create text content
|
| 94 |
+
text = TextContent(
|
| 95 |
+
text="Hello, this is a test of speech synthesis.",
|
| 96 |
+
language="en"
|
| 97 |
+
)
|
| 98 |
+
|
| 99 |
+
# Configure voice settings
|
| 100 |
+
voice_settings = VoiceSettings(
|
| 101 |
+
voice_id="kokoro",
|
| 102 |
+
speed=1.0,
|
| 103 |
+
pitch=0.0,
|
| 104 |
+
volume=1.0
|
| 105 |
+
)
|
| 106 |
+
|
| 107 |
+
# Create synthesis request
|
| 108 |
+
request = SpeechSynthesisRequest(
|
| 109 |
+
text_content=text,
|
| 110 |
+
voice_settings=voice_settings
|
| 111 |
+
)
|
| 112 |
+
|
| 113 |
+
# Synthesize audio
|
| 114 |
+
try:
|
| 115 |
+
audio = service.synthesize(request)
|
| 116 |
+
|
| 117 |
+
# Save to file
|
| 118 |
+
with open("output.wav", "wb") as f:
|
| 119 |
+
f.write(audio.data)
|
| 120 |
+
|
| 121 |
+
print(f"Generated {audio.duration:.1f}s of audio")
|
| 122 |
+
|
| 123 |
+
except SpeechSynthesisException as e:
|
| 124 |
+
print(f"Synthesis failed: {e}")
|
| 125 |
+
```
|
| 126 |
"""
|
| 127 |
pass
|
| 128 |
+
|
| 129 |
@abstractmethod
|
| 130 |
def synthesize_stream(self, request: 'SpeechSynthesisRequest') -> Iterator['AudioChunk']:
|
| 131 |
+
"""Synthesize speech from text as a stream of audio chunks.
|
| 132 |
+
|
| 133 |
+
Converts text content to audio in streaming mode, yielding audio chunks
|
| 134 |
+
as they become available. This method is suitable for real-time playback,
|
| 135 |
+
long texts, or when low latency is required.
|
| 136 |
+
|
| 137 |
+
Implementation considerations:
|
| 138 |
+
- Chunk size optimization for smooth playback
|
| 139 |
+
- Buffer management and memory efficiency
|
| 140 |
+
- Error handling without breaking the stream
|
| 141 |
+
- Proper stream termination and cleanup
|
| 142 |
+
- Latency minimization for real-time use cases
|
| 143 |
+
|
| 144 |
Args:
|
| 145 |
+
request: The speech synthesis request containing text and voice settings.
|
| 146 |
+
Same format as batch synthesis but optimized for streaming.
|
| 147 |
+
|
| 148 |
+
Yields:
|
| 149 |
+
AudioChunk: Individual audio chunks containing:
|
| 150 |
+
- data: Raw audio data for this chunk
|
| 151 |
+
- format: Audio format (consistent across chunks)
|
| 152 |
+
- sample_rate: Audio sample rate in Hz
|
| 153 |
+
- chunk_index: Sequential chunk number
|
| 154 |
+
- is_final: Boolean indicating if this is the last chunk
|
| 155 |
+
- timestamp: Chunk generation timestamp
|
| 156 |
+
|
| 157 |
Raises:
|
| 158 |
+
SpeechSynthesisException: If synthesis fails during streaming:
|
| 159 |
+
- Voice model errors during processing
|
| 160 |
+
- Network issues (for cloud-based synthesis)
|
| 161 |
+
- Resource exhaustion during long synthesis
|
| 162 |
+
ValueError: If request parameters are invalid for streaming
|
| 163 |
+
|
| 164 |
+
Example:
|
| 165 |
+
```python
|
| 166 |
+
# Create streaming synthesis request
|
| 167 |
+
request = SpeechSynthesisRequest(
|
| 168 |
+
text_content=long_text,
|
| 169 |
+
voice_settings=voice_settings
|
| 170 |
+
)
|
| 171 |
+
|
| 172 |
+
# Stream synthesis with real-time playback
|
| 173 |
+
audio_buffer = []
|
| 174 |
+
|
| 175 |
+
try:
|
| 176 |
+
for chunk in service.synthesize_stream(request):
|
| 177 |
+
# Add to playback buffer
|
| 178 |
+
audio_buffer.append(chunk.data)
|
| 179 |
+
|
| 180 |
+
# Start playback when buffer is sufficient
|
| 181 |
+
if len(audio_buffer) >= 3: # Buffer 3 chunks
|
| 182 |
+
play_audio_chunk(audio_buffer.pop(0))
|
| 183 |
+
|
| 184 |
+
# Handle final chunk
|
| 185 |
+
if chunk.is_final:
|
| 186 |
+
# Play remaining buffered chunks
|
| 187 |
+
for remaining in audio_buffer:
|
| 188 |
+
play_audio_chunk(remaining)
|
| 189 |
+
break
|
| 190 |
+
|
| 191 |
+
except SpeechSynthesisException as e:
|
| 192 |
+
print(f"Streaming synthesis failed: {e}")
|
| 193 |
+
```
|
| 194 |
"""
|
| 195 |
pass
|
src/domain/interfaces/translation.py
CHANGED
|
@@ -1,4 +1,15 @@
|
|
| 1 |
-
"""Translation service interface.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
|
| 3 |
from abc import ABC, abstractmethod
|
| 4 |
from typing import TYPE_CHECKING
|
|
@@ -9,20 +20,116 @@ if TYPE_CHECKING:
|
|
| 9 |
|
| 10 |
|
| 11 |
class ITranslationService(ABC):
|
| 12 |
-
"""Interface for translation services.
|
| 13 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
@abstractmethod
|
| 15 |
def translate(self, request: 'TranslationRequest') -> 'TextContent':
|
| 16 |
-
"""
|
| 17 |
-
|
| 18 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
Args:
|
| 20 |
-
request: The translation request containing
|
| 21 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
Returns:
|
| 23 |
-
TextContent: The translated text
|
| 24 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
Raises:
|
| 26 |
-
TranslationFailedException: If translation fails
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
"""
|
| 28 |
pass
|
|
|
|
| 1 |
+
"""Translation service interface.
|
| 2 |
+
|
| 3 |
+
This module defines the interface for text translation services that convert
|
| 4 |
+
text from one language to another. The interface supports multiple translation
|
| 5 |
+
providers and models with language detection and quality assessment.
|
| 6 |
+
|
| 7 |
+
The interface is designed to be:
|
| 8 |
+
- Provider-agnostic: Works with any translation implementation
|
| 9 |
+
- Language-flexible: Supports automatic language detection
|
| 10 |
+
- Quality-aware: Provides confidence scores and alternative translations
|
| 11 |
+
- Context-sensitive: Handles domain-specific translation needs
|
| 12 |
+
"""
|
| 13 |
|
| 14 |
from abc import ABC, abstractmethod
|
| 15 |
from typing import TYPE_CHECKING
|
|
|
|
| 20 |
|
| 21 |
|
| 22 |
class ITranslationService(ABC):
|
| 23 |
+
"""Interface for translation services.
|
| 24 |
+
|
| 25 |
+
This interface defines the contract for translating text between languages
|
| 26 |
+
using various translation models and providers. Implementations should
|
| 27 |
+
handle language detection, context preservation, and quality optimization.
|
| 28 |
+
|
| 29 |
+
Example:
|
| 30 |
+
```python
|
| 31 |
+
# Use through dependency injection
|
| 32 |
+
translation_service = container.resolve(ITranslationService)
|
| 33 |
+
|
| 34 |
+
# Create translation request
|
| 35 |
+
request = TranslationRequest(
|
| 36 |
+
text_content=source_text,
|
| 37 |
+
target_language="zh",
|
| 38 |
+
source_language="en" # Optional, can be auto-detected
|
| 39 |
+
)
|
| 40 |
+
|
| 41 |
+
# Translate text
|
| 42 |
+
result = translation_service.translate(request)
|
| 43 |
+
|
| 44 |
+
print(f"Original: {request.text_content.text}")
|
| 45 |
+
print(f"Translated: {result.text}")
|
| 46 |
+
print(f"Confidence: {result.confidence}")
|
| 47 |
+
```
|
| 48 |
+
"""
|
| 49 |
+
|
| 50 |
@abstractmethod
|
| 51 |
def translate(self, request: 'TranslationRequest') -> 'TextContent':
|
| 52 |
+
"""Translate text from source language to target language.
|
| 53 |
+
|
| 54 |
+
Converts text content from one language to another while preserving
|
| 55 |
+
meaning, context, and formatting where possible. The method should
|
| 56 |
+
handle language detection, domain adaptation, and quality assessment.
|
| 57 |
+
|
| 58 |
+
Implementation considerations:
|
| 59 |
+
- Language detection and validation
|
| 60 |
+
- Context preservation and domain adaptation
|
| 61 |
+
- Handling of special characters and formatting
|
| 62 |
+
- Quality assessment and confidence scoring
|
| 63 |
+
- Caching for repeated translations
|
| 64 |
+
- Fallback mechanisms for unsupported language pairs
|
| 65 |
+
|
| 66 |
Args:
|
| 67 |
+
request: The translation request containing:
|
| 68 |
+
- text_content: Source text with language information
|
| 69 |
+
- target_language: Target language code (ISO 639-1)
|
| 70 |
+
- source_language: Source language code (optional, can be auto-detected)
|
| 71 |
+
- context: Optional context information for better translation
|
| 72 |
+
- domain: Optional domain specification (technical, medical, etc.)
|
| 73 |
+
|
| 74 |
Returns:
|
| 75 |
+
TextContent: The translated text containing:
|
| 76 |
+
- text: Translated text content
|
| 77 |
+
- language: Target language code
|
| 78 |
+
- confidence: Translation confidence score (0.0-1.0)
|
| 79 |
+
- metadata: Additional information including:
|
| 80 |
+
- detected_source_language: Auto-detected source language
|
| 81 |
+
- alternative_translations: Other possible translations
|
| 82 |
+
- processing_time: Translation duration
|
| 83 |
+
- model_used: Translation model identifier
|
| 84 |
+
|
| 85 |
Raises:
|
| 86 |
+
TranslationFailedException: If translation fails due to:
|
| 87 |
+
- Unsupported language pair
|
| 88 |
+
- Model loading or inference errors
|
| 89 |
+
- Network issues (for cloud-based translation)
|
| 90 |
+
- Text processing errors (encoding, length limits)
|
| 91 |
+
ValueError: If request parameters are invalid:
|
| 92 |
+
- Empty text content
|
| 93 |
+
- Invalid language codes
|
| 94 |
+
- Unsupported translation options
|
| 95 |
+
|
| 96 |
+
Example:
|
| 97 |
+
```python
|
| 98 |
+
# Create source text
|
| 99 |
+
source_text = TextContent(
|
| 100 |
+
text="Hello, how are you today?",
|
| 101 |
+
language="en"
|
| 102 |
+
)
|
| 103 |
+
|
| 104 |
+
# Create translation request
|
| 105 |
+
request = TranslationRequest(
|
| 106 |
+
text_content=source_text,
|
| 107 |
+
target_language="zh",
|
| 108 |
+
context="casual_conversation"
|
| 109 |
+
)
|
| 110 |
+
|
| 111 |
+
# Perform translation
|
| 112 |
+
try:
|
| 113 |
+
result = service.translate(request)
|
| 114 |
+
|
| 115 |
+
print(f"Original ({source_text.language}): {source_text.text}")
|
| 116 |
+
print(f"Translated ({result.language}): {result.text}")
|
| 117 |
+
|
| 118 |
+
if result.confidence > 0.9:
|
| 119 |
+
print("High confidence translation")
|
| 120 |
+
elif result.confidence > 0.7:
|
| 121 |
+
print("Medium confidence translation")
|
| 122 |
+
else:
|
| 123 |
+
print("Low confidence translation - review recommended")
|
| 124 |
+
|
| 125 |
+
# Check for alternatives
|
| 126 |
+
if hasattr(result, 'metadata') and 'alternatives' in result.metadata:
|
| 127 |
+
print("Alternative translations:")
|
| 128 |
+
for alt in result.metadata['alternatives'][:3]:
|
| 129 |
+
print(f" - {alt}")
|
| 130 |
+
|
| 131 |
+
except TranslationFailedException as e:
|
| 132 |
+
print(f"Translation failed: {e}")
|
| 133 |
+
```
|
| 134 |
"""
|
| 135 |
pass
|
src/infrastructure/base/tts_provider_base.py
CHANGED
|
@@ -1,4 +1,37 @@
|
|
| 1 |
-
"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
|
| 3 |
import logging
|
| 4 |
import os
|
|
@@ -20,15 +53,57 @@ logger = logging.getLogger(__name__)
|
|
| 20 |
|
| 21 |
|
| 22 |
class TTSProviderBase(ISpeechSynthesisService, ABC):
|
| 23 |
-
"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
|
| 25 |
def __init__(self, provider_name: str, supported_languages: list[str] = None):
|
| 26 |
"""
|
| 27 |
Initialize the TTS provider.
|
| 28 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
Args:
|
| 30 |
-
provider_name:
|
| 31 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 32 |
"""
|
| 33 |
self.provider_name = provider_name
|
| 34 |
self.supported_languages = supported_languages or []
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Base class for TTS provider implementations.
|
| 3 |
+
|
| 4 |
+
This module provides the abstract base class for all Text-to-Speech provider
|
| 5 |
+
implementations in the infrastructure layer. It implements common functionality
|
| 6 |
+
and defines the contract that all TTS providers must follow.
|
| 7 |
+
|
| 8 |
+
The base class handles:
|
| 9 |
+
- Common validation logic
|
| 10 |
+
- File management and cleanup
|
| 11 |
+
- Error handling and logging
|
| 12 |
+
- Audio format processing
|
| 13 |
+
- Provider lifecycle management
|
| 14 |
+
|
| 15 |
+
Example implementation:
|
| 16 |
+
```python
|
| 17 |
+
from src.infrastructure.base.tts_provider_base import TTSProviderBase
|
| 18 |
+
|
| 19 |
+
class MyTTSProvider(TTSProviderBase):
|
| 20 |
+
def __init__(self):
|
| 21 |
+
super().__init__("my_tts", ["en", "es"])
|
| 22 |
+
|
| 23 |
+
def _generate_audio(self, request):
|
| 24 |
+
# Implement TTS-specific logic
|
| 25 |
+
audio_data = my_tts_engine.synthesize(request.text_content.text)
|
| 26 |
+
return audio_data, 22050 # audio_bytes, sample_rate
|
| 27 |
+
|
| 28 |
+
def is_available(self):
|
| 29 |
+
return my_tts_engine.is_loaded()
|
| 30 |
+
|
| 31 |
+
def get_available_voices(self):
|
| 32 |
+
return ["voice1", "voice2"]
|
| 33 |
+
```
|
| 34 |
+
"""
|
| 35 |
|
| 36 |
import logging
|
| 37 |
import os
|
|
|
|
| 53 |
|
| 54 |
|
| 55 |
class TTSProviderBase(ISpeechSynthesisService, ABC):
|
| 56 |
+
"""
|
| 57 |
+
Abstract base class for TTS provider implementations.
|
| 58 |
+
|
| 59 |
+
This class provides a foundation for implementing Text-to-Speech providers
|
| 60 |
+
in the infrastructure layer. It handles common concerns like validation,
|
| 61 |
+
file management, error handling, and audio processing while allowing
|
| 62 |
+
concrete implementations to focus on provider-specific logic.
|
| 63 |
+
|
| 64 |
+
Key features:
|
| 65 |
+
- Automatic validation of synthesis requests
|
| 66 |
+
- Temporary file management with cleanup
|
| 67 |
+
- Consistent error handling and logging
|
| 68 |
+
- Support for both batch and streaming synthesis
|
| 69 |
+
- Audio format standardization
|
| 70 |
+
- Provider availability checking
|
| 71 |
+
|
| 72 |
+
Subclasses must implement:
|
| 73 |
+
- _generate_audio(): Core synthesis logic
|
| 74 |
+
- _generate_audio_stream(): Streaming synthesis (optional)
|
| 75 |
+
- is_available(): Provider availability check
|
| 76 |
+
- get_available_voices(): Voice enumeration
|
| 77 |
+
|
| 78 |
+
The base class ensures that all providers follow the same patterns
|
| 79 |
+
for error handling, logging, and resource management, making the
|
| 80 |
+
system more maintainable and predictable.
|
| 81 |
+
"""
|
| 82 |
|
| 83 |
def __init__(self, provider_name: str, supported_languages: list[str] = None):
|
| 84 |
"""
|
| 85 |
Initialize the TTS provider.
|
| 86 |
|
| 87 |
+
Sets up the provider with basic configuration and creates necessary
|
| 88 |
+
directories for temporary file storage. This constructor should be
|
| 89 |
+
called by all subclass implementations.
|
| 90 |
+
|
| 91 |
Args:
|
| 92 |
+
provider_name: Unique identifier for this TTS provider (e.g., "kokoro", "dia").
|
| 93 |
+
Used for logging, error messages, and provider selection.
|
| 94 |
+
supported_languages: List of ISO language codes supported by this provider
|
| 95 |
+
(e.g., ["en", "zh", "es"]). If None, no language validation
|
| 96 |
+
will be performed.
|
| 97 |
+
|
| 98 |
+
Example:
|
| 99 |
+
```python
|
| 100 |
+
class MyTTSProvider(TTSProviderBase):
|
| 101 |
+
def __init__(self):
|
| 102 |
+
super().__init__(
|
| 103 |
+
provider_name="my_tts",
|
| 104 |
+
supported_languages=["en", "es", "fr"]
|
| 105 |
+
)
|
| 106 |
+
```
|
| 107 |
"""
|
| 108 |
self.provider_name = provider_name
|
| 109 |
self.supported_languages = supported_languages or []
|