Spaces:

DroolingPanda
/

teachingAssistant

Sleeping

App Files Files Community

Michael Hu commited on Jul 27, 2025

Commit

4e4961e

1 Parent(s): 19fd91c

Add documentation and final validation

Browse files

Files changed (8) hide show

DEVELOPER_GUIDE.md +701 -0
README.md +246 -54
pyproject.toml +57 -54
src/domain/interfaces/audio_processing.py +114 -14
src/domain/interfaces/speech_recognition.py +97 -12
src/domain/interfaces/speech_synthesis.py +171 -21
src/domain/interfaces/translation.py +118 -11
src/infrastructure/base/tts_provider_base.py +79 -4

DEVELOPER_GUIDE.md ADDED Viewed

	@@ -0,0 +1,701 @@

+# Developer Guide
+This guide provides comprehensive instructions for extending the Audio Translation System with new providers and contributing to the codebase.
+## Table of Contents
+- [Architecture Overview](#architecture-overview)
+- [Adding New TTS Providers](#adding-new-tts-providers)
+- [Adding New STT Providers](#adding-new-stt-providers)
+- [Adding New Translation Providers](#adding-new-translation-providers)
+- [Testing Guidelines](#testing-guidelines)
+- [Code Style and Standards](#code-style-and-standards)
+- [Debugging and Troubleshooting](#debugging-and-troubleshooting)
+- [Performance Considerations](#performance-considerations)
+## Architecture Overview
+The system follows Domain-Driven Design (DDD) principles with clear separation of concerns:
+```
+src/
+├── domain/                    # Core business logic
+│   ├── interfaces/           # Service contracts (ports)
+│   ├── models/              # Domain entities and value objects
+│   ├── services/            # Domain services
+│   └── exceptions.py        # Domain-specific exceptions
+├── application/             # Use case orchestration
+│   ├── services/            # Application services
+│   ├── dtos/               # Data transfer objects
+│   └── error_handling/     # Application error handling
+├── infrastructure/         # External service implementations
+│   ├── tts/               # TTS provider implementations
+│   ├── stt/               # STT provider implementations
+│   ├── translation/       # Translation service implementations
+│   ├── base/              # Provider base classes
+│   └── config/            # Configuration and DI container
+└── presentation/          # UI layer (app.py)
+```
+### Key Design Patterns
+1. **Provider Pattern**: Pluggable implementations for different services
+2. **Factory Pattern**: Provider creation with fallback logic
+3. **Dependency Injection**: Loose coupling between components
+4. **Repository Pattern**: Data access abstraction
+5. **Strategy Pattern**: Runtime algorithm selection
+## Adding New TTS Providers
+### Step 1: Implement the Provider Class
+Create a new provider class that inherits from `TTSProviderBase`:
+```python
+# src/infrastructure/tts/my_tts_provider.py
+import logging
+from typing import Iterator, List
+from ..base.tts_provider_base import TTSProviderBase
+from ...domain.models.speech_synthesis_request import SpeechSynthesisRequest
+from ...domain.exceptions import SpeechSynthesisException
+logger = logging.getLogger(__name__)
+class MyTTSProvider(TTSProviderBase):
+    """Custom TTS provider implementation."""
+    def __init__(self, api_key: str = None, **kwargs):
+        """Initialize the TTS provider.
+        Args:
+            api_key: Optional API key for cloud-based services
+            **kwargs: Additional provider-specific configuration
+        """
+        super().__init__(
+            provider_name="my_tts",
+            supported_languages=["en", "zh", "es", "fr"]
+        )
+        self.api_key = api_key
+        self._initialize_provider()
+    def _initialize_provider(self):
+        """Initialize provider-specific resources."""
+        try:
+            # Initialize your TTS engine/model here
+            # Example: self.engine = MyTTSEngine(api_key=self.api_key)
+            pass
+        except Exception as e:
+            logger.error(f"Failed to initialize {self.provider_name}: {e}")
+            raise SpeechSynthesisException(f"Provider initialization failed: {e}")
+    def is_available(self) -> bool:
+        """Check if the provider is available and ready to use."""
+        try:
+            # Check if dependencies are installed
+            # Check if models are loaded
+            # Check if API is accessible (for cloud services)
+            return True  # Replace with actual availability check
+        except Exception:
+            return False
+    def get_available_voices(self) -> List[str]:
+        """Get list of available voices for this provider."""
+        # Return actual voice IDs supported by your provider
+        return ["voice1", "voice2", "voice3"]
+    def _generate_audio(self, request: SpeechSynthesisRequest) -> tuple[bytes, int]:
+        """Generate audio data from synthesis request.
+        Args:
+            request: The speech synthesis request
+        Returns:
+            tuple: (audio_data_bytes, sample_rate)
+        """
+        try:
+            text = request.text_content.text
+            voice_id = request.voice_settings.voice_id
+            speed = request.voice_settings.speed
+            # Implement your TTS synthesis logic here
+            # Example:
+            # audio_data = self.engine.synthesize(
+            #     text=text,
+            #     voice=voice_id,
+            #     speed=speed
+            # )
+            # Return audio data and sample rate
+            audio_data = b"dummy_audio_data"  # Replace with actual synthesis
+            sample_rate = 22050  # Replace with actual sample rate
+            return audio_data, sample_rate
+        except Exception as e:
+            self._handle_provider_error(e, "audio generation")
+    def _generate_audio_stream(self, request: SpeechSynthesisRequest) -> Iterator[tuple[bytes, int, bool]]:
+        """Generate audio data stream from synthesis request.
+        Args:
+            request: The speech synthesis request
+        Yields:
+            tuple: (audio_data_bytes, sample_rate, is_final)
+        """
+        try:
+            # Implement streaming synthesis if supported
+            # For non-streaming providers, you can yield the complete audio as a single chunk
+            audio_data, sample_rate = self._generate_audio(request)
+            yield audio_data, sample_rate, True
+        except Exception as e:
+            self._handle_provider_error(e, "streaming audio generation")
+```
+### Step 2: Register the Provider
+Add your provider to the factory registration:
+```python
+# src/infrastructure/tts/provider_factory.py
+def _register_default_providers(self):
+    """Register all available TTS providers."""
+    # ... existing providers ...
+    # Try to register your custom provider
+    try:
+        from .my_tts_provider import MyTTSProvider
+        self._providers['my_tts'] = MyTTSProvider
+        logger.info("Registered MyTTS provider")
+    except ImportError as e:
+        logger.debug(f"MyTTS provider not available: {e}")
+```
+### Step 3: Add Configuration Support
+Update the configuration to include your provider:
+```python
+# src/infrastructure/config/app_config.py
+class AppConfig:
+    # ... existing configuration ...
+    # TTS Provider Configuration
+    TTS_PROVIDERS = os.getenv('TTS_PROVIDERS', 'kokoro,dia,cosyvoice2,my_tts,dummy').split(',')
+    # Provider-specific settings
+    MY_TTS_API_KEY = os.getenv('MY_TTS_API_KEY')
+    MY_TTS_MODEL = os.getenv('MY_TTS_MODEL', 'default')
+```
+### Step 4: Add Tests
+Create comprehensive tests for your provider:
+```python
+# tests/unit/infrastructure/tts/test_my_tts_provider.py
+import pytest
+from unittest.mock import Mock, patch
+from src.infrastructure.tts.my_tts_provider import MyTTSProvider
+from src.domain.models.speech_synthesis_request import SpeechSynthesisRequest
+from src.domain.models.text_content import TextContent
+from src.domain.models.voice_settings import VoiceSettings
+from src.domain.exceptions import SpeechSynthesisException
+class TestMyTTSProvider:
+    """Test suite for MyTTS provider."""
+    @pytest.fixture
+    def provider(self):
+        """Create a test provider instance."""
+        return MyTTSProvider(api_key="test_key")
+    @pytest.fixture
+    def synthesis_request(self):
+        """Create a test synthesis request."""
+        text_content = TextContent(text="Hello world", language="en")
+        voice_settings = VoiceSettings(voice_id="voice1", speed=1.0)
+        return SpeechSynthesisRequest(
+            text_content=text_content,
+            voice_settings=voice_settings
+        )
+    def test_provider_initialization(self, provider):
+        """Test provider initializes correctly."""
+        assert provider.provider_name == "my_tts"
+        assert "en" in provider.supported_languages
+        assert provider.is_available()
+    def test_get_available_voices(self, provider):
+        """Test voice listing."""
+        voices = provider.get_available_voices()
+        assert isinstance(voices, list)
+        assert len(voices) > 0
+        assert "voice1" in voices
+    def test_synthesize_success(self, provider, synthesis_request):
+        """Test successful synthesis."""
+        with patch.object(provider, '_generate_audio') as mock_generate:
+            mock_generate.return_value = (b"audio_data", 22050)
+            result = provider.synthesize(synthesis_request)
+            assert result.data == b"audio_data"
+            assert result.format == "wav"
+            assert result.sample_rate == 22050
+            mock_generate.assert_called_once_with(synthesis_request)
+    def test_synthesize_failure(self, provider, synthesis_request):
+        """Test synthesis failure handling."""
+        with patch.object(provider, '_generate_audio') as mock_generate:
+            mock_generate.side_effect = Exception("Synthesis failed")
+            with pytest.raises(SpeechSynthesisException):
+                provider.synthesize(synthesis_request)
+    def test_synthesize_stream(self, provider, synthesis_request):
+        """Test streaming synthesis."""
+        chunks = list(provider.synthesize_stream(synthesis_request))
+        assert len(chunks) > 0
+        assert chunks[-1].is_final  # Last chunk should be marked as final
+        # Verify chunk structure
+        for chunk in chunks:
+            assert hasattr(chunk, 'data')
+            assert hasattr(chunk, 'sample_rate')
+            assert hasattr(chunk, 'is_final')
+```
+### Step 5: Add Integration Tests
+```python
+# tests/integration/test_my_tts_integration.py
+import pytest
+from src.infrastructure.config.container_setup import initialize_global_container
+from src.infrastructure.tts.provider_factory import TTSProviderFactory
+from src.domain.models.speech_synthesis_request import SpeechSynthesisRequest
+from src.domain.models.text_content import TextContent
+from src.domain.models.voice_settings import VoiceSettings
+@pytest.mark.integration
+class TestMyTTSIntegration:
+    """Integration tests for MyTTS provider."""
+    def test_provider_factory_integration(self):
+        """Test provider works with factory."""
+        factory = TTSProviderFactory()
+        if 'my_tts' in factory.get_available_providers():
+            provider = factory.create_provider('my_tts')
+            assert provider.is_available()
+            assert len(provider.get_available_voices()) > 0
+    def test_end_to_end_synthesis(self):
+        """Test complete synthesis workflow."""
+        container = initialize_global_container()
+        factory = container.resolve(TTSProviderFactory)
+        if 'my_tts' in factory.get_available_providers():
+            provider = factory.create_provider('my_tts')
+            # Create synthesis request
+            text_content = TextContent(text="Integration test", language="en")
+            voice_settings = VoiceSettings(voice_id="voice1", speed=1.0)
+            request = SpeechSynthesisRequest(
+                text_content=text_content,
+                voice_settings=voice_settings
+            )
+            # Synthesize audio
+            result = provider.synthesize(request)
+            assert result.data is not None
+            assert result.duration > 0
+            assert result.sample_rate > 0
+```
+## Adding New STT Providers
+### Step 1: Implement the Provider Class
+```python
+# src/infrastructure/stt/my_stt_provider.py
+import logging
+from typing import List
+from ..base.stt_provider_base import STTProviderBase
+from ...domain.models.audio_content import AudioContent
+from ...domain.models.text_content import TextContent
+from ...domain.exceptions import SpeechRecognitionException
+logger = logging.getLogger(__name__)
+class MySTTProvider(STTProviderBase):
+    """Custom STT provider implementation."""
+    def __init__(self, model_path: str = None, **kwargs):
+        """Initialize the STT provider.
+        Args:
+            model_path: Path to the STT model
+            **kwargs: Additional provider-specific configuration
+        """
+        super().__init__(
+            provider_name="my_stt",
+            supported_languages=["en", "zh", "es", "fr"],
+            supported_models=["my_stt_small", "my_stt_large"]
+        )
+        self.model_path = model_path
+        self._initialize_provider()
+    def _initialize_provider(self):
+        """Initialize provider-specific resources."""
+        try:
+            # Initialize your STT engine/model here
+            # Example: self.model = MySTTModel.load(self.model_path)
+            pass
+        except Exception as e:
+            logger.error(f"Failed to initialize {self.provider_name}: {e}")
+            raise SpeechRecognitionException(f"Provider initialization failed: {e}")
+    def is_available(self) -> bool:
+        """Check if the provider is available."""
+        try:
+            # Check dependencies, model availability, etc.
+            return True  # Replace with actual check
+        except Exception:
+            return False
+    def get_supported_models(self) -> List[str]:
+        """Get list of supported models."""
+        return self.supported_models
+    def _transcribe_audio(self, audio: AudioContent, model: str) -> tuple[str, float, dict]:
+        """Transcribe audio using the specified model.
+        Args:
+            audio: Audio content to transcribe
+            model: Model identifier to use
+        Returns:
+            tuple: (transcribed_text, confidence_score, metadata)
+        """
+        try:
+            # Implement your STT logic here
+            # Example:
+            # result = self.model.transcribe(
+            #     audio_data=audio.data,
+            #     sample_rate=audio.sample_rate,
+            #     model=model
+            # )
+            # Return transcription results
+            text = "Transcribed text"  # Replace with actual transcription
+            confidence = 0.95  # Replace with actual confidence
+            metadata = {
+                "model_used": model,
+                "processing_time": 1.5,
+                "language_detected": "en"
+            }
+            return text, confidence, metadata
+        except Exception as e:
+            self._handle_provider_error(e, "transcription")
+```
+### Step 2: Register and Test
+Follow similar steps as TTS providers for registration, configuration, and testing.
+## Adding New Translation Providers
+### Step 1: Implement the Provider Class
+```python
+# src/infrastructure/translation/my_translation_provider.py
+import logging
+from typing import List, Dict
+from ..base.translation_provider_base import TranslationProviderBase
+from ...domain.models.translation_request import TranslationRequest
+from ...domain.models.text_content import TextContent
+from ...domain.exceptions import TranslationFailedException
+logger = logging.getLogger(__name__)
+class MyTranslationProvider(TranslationProviderBase):
+    """Custom translation provider implementation."""
+    def __init__(self, api_key: str = None, **kwargs):
+        """Initialize the translation provider."""
+        super().__init__(
+            provider_name="my_translation",
+            supported_languages=["en", "zh", "es", "fr", "de", "ja"]
+        )
+        self.api_key = api_key
+        self._initialize_provider()
+    def _initialize_provider(self):
+        """Initialize provider-specific resources."""
+        try:
+            # Initialize your translation engine/model here
+            pass
+        except Exception as e:
+            logger.error(f"Failed to initialize {self.provider_name}: {e}")
+            raise TranslationFailedException(f"Provider initialization failed: {e}")
+    def is_available(self) -> bool:
+        """Check if the provider is available."""
+        try:
+            # Check dependencies, API connectivity, etc.
+            return True  # Replace with actual check
+        except Exception:
+            return False
+    def get_supported_language_pairs(self) -> List[tuple[str, str]]:
+        """Get supported language pairs."""
+        # Return list of (source_lang, target_lang) tuples
+        pairs = []
+        for source in self.supported_languages:
+            for target in self.supported_languages:
+                if source != target:
+                    pairs.append((source, target))
+        return pairs
+    def _translate_text(self, request: TranslationRequest) -> tuple[str, float, dict]:
+        """Translate text using the provider.
+        Args:
+            request: Translation request
+        Returns:
+            tuple: (translated_text, confidence_score, metadata)
+        """
+        try:
+            source_text = request.text_content.text
+            source_lang = request.source_language or request.text_content.language
+            target_lang = request.target_language
+            # Implement your translation logic here
+            # Example:
+            # result = self.translator.translate(
+            #     text=source_text,
+            #     source_lang=source_lang,
+            #     target_lang=target_lang
+            # )
+            # Return translation results
+            translated_text = f"Translated: {source_text}"  # Replace with actual translation
+            confidence = 0.92  # Replace with actual confidence
+            metadata = {
+                "source_language_detected": source_lang,
+                "target_language": target_lang,
+                "processing_time": 0.5,
+                "model_used": "my_translation_model"
+            }
+            return translated_text, confidence, metadata
+        except Exception as e:
+            self._handle_provider_error(e, "translation")
+```
+## Testing Guidelines
+### Unit Testing
+- Test each provider in isolation using mocks
+- Cover success and failure scenarios
+- Test edge cases (empty input, invalid parameters)
+- Verify error handling and exception propagation
+### Integration Testing
+- Test provider integration with factories
+- Test complete pipeline workflows
+- Test fallback mechanisms
+- Test with real external services (when available)
+### Performance Testing
+- Measure processing times for different input sizes
+- Test memory usage and resource cleanup
+- Test concurrent processing capabilities
+- Benchmark against existing providers
+### Test Structure
+```
+tests/
+├── unit/
+│   ├── domain/
+│   ├── application/
+│   └── infrastructure/
+│       ├── tts/
+│       ├── stt/
+│       └── translation/
+├── integration/
+│   ├── test_complete_pipeline.py
+│   ├── test_provider_fallback.py
+│   └── test_error_recovery.py
+└── performance/
+    ├── test_processing_speed.py
+    ├── test_memory_usage.py
+    └── test_concurrent_processing.py
+```
+## Code Style and Standards
+### Python Style Guide
+- Follow PEP 8 for code formatting
+- Use type hints for all public methods
+- Write comprehensive docstrings (Google style)
+- Use meaningful variable and function names
+- Keep functions focused and small (< 50 lines)
+### Documentation Standards
+- Document all public interfaces
+- Include usage examples in docstrings
+- Explain complex algorithms and business logic
+- Keep documentation up-to-date with code changes
+### Error Handling
+- Use domain-specific exceptions
+- Provide detailed error messages
+- Log errors with appropriate levels
+- Implement graceful degradation where possible
+### Logging
+```python
+import logging
+logger = logging.getLogger(__name__)
+# Use appropriate log levels
+logger.debug("Detailed debugging information")
+logger.info("General information about program execution")
+logger.warning("Something unexpected happened")
+logger.error("A serious error occurred")
+logger.critical("A very serious error occurred")
+```
+## Debugging and Troubleshooting
+### Common Issues
+1. **Provider Not Available**
+   - Check dependencies are installed
+   - Verify configuration settings
+   - Check logs for initialization errors
+2. **Poor Quality Output**
+   - Verify input audio quality
+   - Check model parameters
+   - Review provider-specific settings
+3. **Performance Issues**
+   - Profile code execution
+   - Check memory usage
+   - Optimize audio processing pipeline
+### Debugging Tools
+- Use Python debugger (pdb) for step-through debugging
+- Enable detailed logging for troubleshooting
+- Use profiling tools (cProfile, memory_profiler)
+- Monitor system resources during processing
+### Logging Configuration
+```python
+# Enable debug logging for development
+import logging
+logging.basicConfig(
+    level=logging.DEBUG,
+    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
+    handlers=[
+        logging.FileHandler("debug.log"),
+        logging.StreamHandler()
+    ]
+)
+```
+## Performance Considerations
+### Optimization Strategies
+1. **Audio Processing**
+   - Use appropriate sample rates
+   - Implement streaming where possible
+   - Cache processed results
+   - Optimize memory usage
+2. **Model Loading**
+   - Load models once and reuse
+   - Use lazy loading for optional providers
+   - Implement model caching strategies
+3. **Concurrent Processing**
+   - Use async/await for I/O operations
+   - Implement thread-safe providers
+   - Consider multiprocessing for CPU-intensive tasks
+### Memory Management
+- Clean up temporary files
+- Release model resources when not needed
+- Monitor memory usage in long-running processes
+- Implement resource pooling for expensive operations
+### Monitoring and Metrics
+- Track processing times
+- Monitor error rates
+- Measure resource utilization
+- Implement health checks
+## Contributing Guidelines
+### Development Workflow
+1. Fork the repository
+2. Create a feature branch
+3. Implement changes with tests
+4. Run the full test suite
+5. Submit a pull request
+### Code Review Process
+- All changes require code review
+- Tests must pass before merging
+- Documentation must be updated
+- Performance impact should be assessed
+### Release Process
+- Follow semantic versioning
+- Update changelog
+- Tag releases appropriately
+- Deploy to staging before production
+---
+For questions or support, please refer to the project documentation or open an issue in the repository.

README.md CHANGED Viewed

@@ -1,89 +1,281 @@
----
-title: TeachingAssistant
-emoji: 🚀
-colorFrom: gray
-colorTo: blue
-sdk: streamlit
-sdk_version: 1.44.1
-app_file: app.py
-pinned: false
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
-# Speech Recognition Module Refactoring
-## Overview
-The speech recognition module (`utils/stt.py`) has been refactored to support multiple ASR (Automatic Speech Recognition) models. The implementation now follows a factory pattern that allows easy switching between different speech recognition models while maintaining a consistent interface.
-## Supported Models
-### 1. Whisper (Default)
-- Based on OpenAI's Whisper Large-v3 model
-- High accuracy for general speech recognition
-- No additional installation required
-### 2. Parakeet
-- NVIDIA's Parakeet-TDT-0.6B model
-- Optimized for real-time transcription
-- Requires additional installation (see below)
-## Installation
-### For Parakeet Support
-To use the Parakeet model, you need to install the NeMo Toolkit:
-```bash
-pip install -U 'nemo_toolkit[asr]'
 ```
-Alternatively, you can use the provided requirements file:
 ```bash
-pip install -r requirements-parakeet.txt
 ```
-## Usage
-### In the Web Application
-The web application now includes a dropdown menu to select the ASR model. Simply choose your preferred model before uploading an audio file.
-### Programmatic Usage
-```python
-from utils.stt import transcribe_audio
-# Using the default Whisper model
-text = transcribe_audio("path/to/audio.wav")
-# Using the Parakeet model
-text = transcribe_audio("path/to/audio.wav", model_name="parakeet")
 ```
-### Direct Model Access
-For more advanced usage, you can directly access the model classes:
 ```python
-from utils.stt import ASRFactory
-# Get a specific model instance
-whisper_model = ASRFactory.get_model("whisper")
-parakeet_model = ASRFactory.get_model("parakeet")
-# Use the model directly
-text = whisper_model.transcribe("path/to/audio.wav")
 ```
-## Architecture
-The refactored code follows these design patterns:
-1. **Abstract Base Class**: `ASRModel` defines the interface for all speech recognition models
-2. **Factory Pattern**: `ASRFactory` creates the appropriate model instance based on the requested model name
-3. **Strategy Pattern**: Different model implementations can be swapped at runtime
-This architecture makes it easy to add support for additional ASR models in the future.

+# Audio Translation System
+A high-quality audio translation system built using Domain-Driven Design (DDD) principles. The application processes audio through a pipeline: Speech-to-Text (STT) → Translation → Text-to-Speech (TTS), supporting multiple providers for each service with automatic fallback mechanisms.
+## 🏗️ Architecture Overview
+The application follows a clean DDD architecture with clear separation of concerns:
+```
+src/
+├── domain/                    # 🧠 Business logic and rules
+│   ├── models/               # Domain entities and value objects
+│   ├── services/             # Domain services
+│   ├── interfaces/           # Domain interfaces (ports)
+│   └── exceptions.py         # Domain-specific exceptions
+├── application/              # 🎯 Use case orchestration
+│   ├── services/             # Application services
+│   ├── dtos/                 # Data transfer objects
+│   └── error_handling/       # Application error handling
+├── infrastructure/           # 🔧 External concerns
+│   ├── tts/                  # TTS provider implementations
+│   ├── stt/                  # STT provider implementations
+│   ├── translation/          # Translation service implementations
+│   ├── base/                 # Provider base classes
+│   └── config/               # Configuration and DI container
+└── presentation/             # 🖥️ UI layer
+    └── (Streamlit app in app.py)
+```
+### 🔄 Data Flow
+```mermaid
+graph TD
+    A[User Upload] --> B[Presentation Layer]
+    B --> C[Application Service]
+    C --> D[Domain Service]
+    D --> E[STT Provider]
+    D --> F[Translation Provider]
+    D --> G[TTS Provider]
+    E --> H[Infrastructure Layer]
+    F --> H
+    G --> H
+    H --> I[External Services]
+```
+## 🚀 Quick Start
+### Prerequisites
+- Python 3.9+
+- FFmpeg (for audio processing)
+- Optional: CUDA for GPU acceleration
+### Installation
+1. **Clone the repository:**
+   ```bash
+   git clone <repository-url>
+   cd audio-translation-system
+   ```
+2. **Install dependencies:**
+   ```bash
+   pip install -r requirements.txt
+   ```
+3. **Run the application:**
+   ```bash
+   streamlit run app.py
+   ```
+4. **Access the web interface:**
+   Open your browser to `http://localhost:8501`
+## 🎛️ Supported Providers
+### Speech-to-Text (STT)
+- **Whisper** (Default) - OpenAI's Whisper Large-v3 model
+- **Parakeet** - NVIDIA's Parakeet-TDT-0.6B model (requires NeMo Toolkit)
+### Translation
+- **NLLB** - Meta's No Language Left Behind model
+### Text-to-Speech (TTS)
+- **Kokoro** - High-quality neural TTS
+- **Dia** - Fast neural TTS
+- **CosyVoice2** - Advanced voice synthesis
+- **Dummy** - Test provider for development
+## 📖 Usage
+### Web Interface
+1. **Upload Audio**: Support for WAV, MP3, FLAC, OGG formats
+2. **Select Model**: Choose STT model (Whisper/Parakeet)
+3. **Choose Language**: Select target translation language
+4. **Pick Voice**: Select TTS voice and speed
+5. **Process**: Click to start the translation pipeline
+### Programmatic Usage
+```python
+from src.infrastructure.config.container_setup import initialize_global_container
+from src.application.services.audio_processing_service import AudioProcessingApplicationService
+from src.application.dtos.processing_request_dto import ProcessingRequestDto
+from src.application.dtos.audio_upload_dto import AudioUploadDto
+# Initialize dependency container
+container = initialize_global_container()
+audio_service = container.resolve(AudioProcessingApplicationService)
+# Create request
+with open("audio.wav", "rb") as f:
+    audio_upload = AudioUploadDto(
+        filename="audio.wav",
+        content=f.read(),
+        content_type="audio/wav",
+        size=os.path.getsize("audio.wav")
+    )
+request = ProcessingRequestDto(
+    audio=audio_upload,
+    asr_model="whisper-small",
+    target_language="zh",
+    voice="kokoro",
+    speed=1.0
+)
+# Process audio
+result = audio_service.process_audio_pipeline(request)
+if result.success:
+    print(f"Original: {result.original_text}")
+    print(f"Translated: {result.translated_text}")
+    print(f"Audio saved to: {result.audio_path}")
+else:
+    print(f"Error: {result.error_message}")
 ```
+## 🧪 Testing
+The project includes comprehensive test coverage:
 ```bash
+# Run all tests
+python -m pytest
+# Run specific test categories
+python -m pytest tests/unit/          # Unit tests
+python -m pytest tests/integration/   # Integration tests
+# Run with coverage
+python -m pytest --cov=src --cov-report=html
 ```
+### Test Structure
+- **Unit Tests**: Test individual components in isolation
+- **Integration Tests**: Test provider integrations and complete pipeline
+- **Mocking**: Uses dependency injection for easy mocking
+## 🔧 Configuration
+### Environment Variables
+Create a `.env` file or set environment variables:
+```bash
+# Provider preferences (comma-separated, in order of preference)
+TTS_PROVIDERS=kokoro,dia,cosyvoice2,dummy
+STT_PROVIDERS=whisper,parakeet
+TRANSLATION_PROVIDERS=nllb
+# Logging
+LOG_LEVEL=INFO
+LOG_FILE=app.log
+# Performance
+MAX_FILE_SIZE_MB=100
+TEMP_FILE_CLEANUP_HOURS=24
 ```
+### Provider Configuration
+The system automatically detects available providers and falls back gracefully:
 ```python
+# Example: Custom provider configuration
+from src.infrastructure.config.dependency_container import DependencyContainer
+container = DependencyContainer()
+container.configure_tts_providers(['kokoro', 'dummy'])  # Preferred order
+```
+## 🏗️ Architecture Benefits
+### 🎯 Domain-Driven Design
+- **Clear Business Logic**: Domain layer contains pure business rules
+- **Separation of Concerns**: Each layer has distinct responsibilities
+- **Testability**: Easy to test business logic independently
+### 🔌 Dependency Injection
+- **Loose Coupling**: Components depend on abstractions, not implementations
+- **Easy Testing**: Mock dependencies for unit testing
+- **Flexibility**: Swap implementations without changing business logic
+### 🛡️ Error Handling
+- **Layered Exceptions**: Domain exceptions mapped to user-friendly messages
+- **Graceful Fallbacks**: Automatic provider fallback on failures
+- **Structured Logging**: Correlation IDs and detailed error tracking
+### 📈 Extensibility
+- **Plugin Architecture**: Add new providers by implementing interfaces
+- **Configuration-Driven**: Change behavior through configuration
+- **Provider Factories**: Automatic provider discovery and instantiation
+## 🔍 Troubleshooting
+### Common Issues
+**Import Errors:**
+```bash
+# Ensure all dependencies are installed
+pip install -r requirements.txt
+# For Parakeet support
+pip install 'nemo_toolkit[asr]'
 ```
+**Audio Processing Errors:**
+- Verify FFmpeg is installed and in PATH
+- Check audio file format is supported
+- Ensure sufficient disk space for temporary files
+**Provider Unavailable:**
+- Check provider-specific dependencies
+- Review logs for detailed error messages
+- Verify provider configuration
+### Debug Mode
+Enable detailed logging:
+```python
+import logging
+logging.basicConfig(level=logging.DEBUG)
+```
+## 🤝 Contributing
+### Adding New Providers
+See [DEVELOPER_GUIDE.md](DEVELOPER_GUIDE.md) for detailed instructions on:
+- Implementing new TTS providers
+- Adding STT models
+- Extending translation services
+- Writing tests
+### Development Setup
+1. **Install development dependencies:**
+   ```bash
+   pip install -r requirements-dev.txt
+   ```
+2. **Run pre-commit hooks:**
+   ```bash
+   pre-commit install
+   pre-commit run --all-files
+   ```
+3. **Run tests before committing:**
+   ```bash
+   python -m pytest tests/
+   ```
+## 📄 License
+This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
+---
+For detailed developer documentation, see [DEVELOPER_GUIDE.md](DEVELOPER_GUIDE.md)

pyproject.toml CHANGED Viewed

@@ -1,60 +1,63 @@
-[tool.poetry]
 name = "audio-translator"
 version = "0.1.0"
 description = "High-quality audio translation web application"
-authors = ["Your Name <your.email@example.com>"]
-license = "MIT"
 readme = "README.md"
-homepage = "https://github.com/yourusername/audio-translator"
-repository = "https://github.com/yourusername/audio-translator"
-keywords = ["nlp", "translation", "speech-processing"]
-[tool.poetry.dependencies]
-python = "^3.9"
-# Core application dependencies
-streamlit = ">=1.31,<2.0"
-python-dotenv = ">=1.0"
-nltk = ">=3.8"
-librosa = ">=0.10"
-soundfile = ">=0.12"
-ffmpeg-python = ">=0.2"
-kokoro = ">=0.7.9"
-faster-whisper = ">=1.1.1"
-# Machine learning dependencies
-#torch = [
-#    { version = ">=2.0,<3.0", source = "pytorch", markers = "sys_platform != 'darwin'" },
-#    { version = ">=2.0,<3.0", source = "pytorch-mac", markers = "sys_platform == 'darwin'" }
-#]
-torch = { version = ">=2.0,<3.0" }
-transformers = { version = ">=4.33", extras = ["audio"] }
-TTS = ">=0.20,<1.0"
-# Platform-specific audio dependencies
-torchaudio = { version = ">=2.0,<3.0", optional = true }
 [build-system]
-requires = ["poetry-core>=1.3"]
-build-backend = "poetry.core.masonry.api"
-[[tool.poetry.source]]
-name = "pytorch"
-url = "https://download.pytorch.org/whl/cpu"
-priority = "supplemental"
-[[tool.poetry.source]]
-name = "pytorch-mac"
-url = "https://download.pytorch.org/whl/cpu"
-priority = "supplemental"
-[[tool.poetry.source]]
-name = "tsinghua"
-url = "https://pypi.tuna.tsinghua.edu.cn/simple"
-priority = "primary"
-[tool.poetry.extras]
-gpu = ["torchaudio"]
-[tool.poetry.scripts]
-start = "app:main"

+[project]
 name = "audio-translator"
 version = "0.1.0"
 description = "High-quality audio translation web application"
+authors = [
+    {name = "Your Name", email = "your.email@example.com"}
+]
+license = {text = "MIT"}
 readme = "README.md"
+requires-python = ">=3.10"
+dependencies = [
+    "streamlit>=1.44.1",
+    "gradio==5.9.1",
+    "nltk>=3.8",
+    "librosa>=0.10",
+    "ffmpeg-python>=0.2",
+    "transformers[audio]>=4.33",
+    "torch>=2.1.0",
+    "torchaudio>=2.1.0",
+    "scipy>=1.11",
+    "munch>=2.5",
+    "accelerate>=1.2.0",
+    "soundfile>=0.13.0",
+    "kokoro>=0.7.9",
+    "ordered-set>=4.1.0",
+    "phonemizer-fork>=3.3.2",
+    "nemo_toolkit[asr]",
+    "faster-whisper>=1.1.1",
+]
+[project.optional-dependencies]
+dev = [
+    "pytest>=7.0",
+    "pytest-cov>=4.0",
+    "pytest-mock>=3.10",
+    "black>=23.0",
+    "flake8>=6.0",
+    "mypy>=1.0",
+]
 [build-system]
+requires = ["hatchling"]
+build-backend = "hatchling.build"
+[tool.hatch.build.targets.wheel]
+packages = ["src"]
+[tool.pytest.ini_options]
+testpaths = ["tests"]
+python_files = ["test_*.py", "*_test.py"]
+python_classes = ["Test*"]
+python_functions = ["test_*"]
+addopts = "-v --tb=short"
+[tool.black]
+line-length = 100
+target-version = ['py39']
+[tool.mypy]
+python_version = "3.9"
+warn_return_any = true
+warn_unused_configs = true
+disallow_untyped_defs = true

src/domain/interfaces/audio_processing.py CHANGED Viewed

@@ -1,4 +1,27 @@
-"""Audio processing service interface."""
 from abc import ABC, abstractmethod
 from typing import TYPE_CHECKING
@@ -10,27 +33,104 @@ if TYPE_CHECKING:
 class IAudioProcessingService(ABC):
-    """Interface for audio processing pipeline orchestration."""
     @abstractmethod
     def process_audio_pipeline(
-        self,
-        audio: 'AudioContent',
-        target_language: str,
         voice_settings: 'VoiceSettings'
     ) -> 'ProcessingResult':
         """
         Process audio through the complete pipeline: STT -> Translation -> TTS.
         Args:
-            audio: The input audio content
-            target_language: The target language for translation
-            voice_settings: Voice settings for TTS synthesis
         Returns:
-            ProcessingResult: The result of the complete processing pipeline
         Raises:
-            AudioProcessingException: If any step in the pipeline fails
         """
         pass

+"""
+Audio processing service interface.
+This module defines the core interface for audio processing pipeline orchestration.
+The interface follows Domain-Driven Design principles, providing a clean contract
+for the complete audio translation workflow.
+Example:
+    ```python
+    from src.domain.interfaces.audio_processing import IAudioProcessingService
+    from src.domain.models.audio_content import AudioContent
+    from src.domain.models.voice_settings import VoiceSettings
+    # Get service implementation from DI container
+    audio_service = container.resolve(IAudioProcessingService)
+    # Process audio through complete pipeline
+    result = audio_service.process_audio_pipeline(
+        audio=audio_content,
+        target_language="zh",
+        voice_settings=voice_settings
+    )
+    ```
+"""
 from abc import ABC, abstractmethod
 from typing import TYPE_CHECKING
 class IAudioProcessingService(ABC):
+    """
+    Interface for audio processing pipeline orchestration.
+    This interface defines the contract for the complete audio translation pipeline,
+    coordinating Speech-to-Text, Translation, and Text-to-Speech services to provide
+    end-to-end audio translation functionality.
+    The interface is designed to be:
+    - Provider-agnostic: Works with any STT/Translation/TTS implementation
+    - Error-resilient: Handles failures gracefully with appropriate exceptions
+    - Observable: Provides detailed processing results and metadata
+    - Testable: Easy to mock for unit testing
+    Implementations should handle:
+    - Provider selection and fallback logic
+    - Error handling and recovery
+    - Performance monitoring and logging
+    - Resource cleanup and management
+    """
     @abstractmethod
     def process_audio_pipeline(
+        self,
+        audio: 'AudioContent',
+        target_language: str,
         voice_settings: 'VoiceSettings'
     ) -> 'ProcessingResult':
         """
         Process audio through the complete pipeline: STT -> Translation -> TTS.
+        This method orchestrates the complete audio translation workflow:
+        1. Speech Recognition: Convert audio to text
+        2. Translation: Translate text to target language (if needed)
+        3. Speech Synthesis: Convert translated text back to audio
+        The implementation should:
+        - Validate input parameters
+        - Handle provider failures with fallback mechanisms
+        - Provide detailed error information on failure
+        - Clean up temporary resources
+        - Log processing steps for observability
         Args:
+            audio: The input audio content to process. Must be a valid AudioContent
+                  instance with supported format and reasonable duration.
+            target_language: The target language code for translation (e.g., 'zh', 'es', 'fr').
+                           Must be supported by the translation provider.
+            voice_settings: Voice configuration for TTS synthesis including voice ID,
+                          speed, and language preferences.
         Returns:
+            ProcessingResult: Comprehensive result containing:
+                - success: Boolean indicating overall success
+                - original_text: Transcribed text from STT (if successful)
+                - translated_text: Translated text (if translation was performed)
+                - audio_output: Generated audio content (if TTS was successful)
+                - processing_time: Total processing duration in seconds
+                - error_message: Detailed error description (if failed)
+                - metadata: Additional processing information and metrics
         Raises:
+            AudioProcessingException: If any step in the pipeline fails and cannot
+                                    be recovered through fallback mechanisms.
+            ValueError: If input parameters are invalid or unsupported.
+        Example:
+            ```python
+            # Create audio content from file
+            with open("input.wav", "rb") as f:
+                audio = AudioContent(
+                    data=f.read(),
+                    format="wav",
+                    sample_rate=16000,
+                    duration=10.5
+                )
+            # Configure voice settings
+            voice_settings = VoiceSettings(
+                voice_id="kokoro",
+                speed=1.0,
+                language="zh"
+            )
+            # Process through pipeline
+            result = service.process_audio_pipeline(
+                audio=audio,
+                target_language="zh",
+                voice_settings=voice_settings
+            )
+            if result.success:
+                print(f"Original: {result.original_text}")
+                print(f"Translated: {result.translated_text}")
+                # Save output audio
+                with open("output.wav", "wb") as f:
+                    f.write(result.audio_output.data)
+            else:
+                print(f"Processing failed: {result.error_message}")
+            ```
         """
         pass

src/domain/interfaces/speech_recognition.py CHANGED Viewed

@@ -1,4 +1,15 @@
-"""Speech recognition service interface."""
 from abc import ABC, abstractmethod
 from typing import TYPE_CHECKING
@@ -9,21 +20,95 @@ if TYPE_CHECKING:
 class ISpeechRecognitionService(ABC):
-    """Interface for speech recognition services."""
     @abstractmethod
     def transcribe(self, audio: 'AudioContent', model: str) -> 'TextContent':
-        """
-        Transcribe audio content to text.
         Args:
-            audio: The audio content to transcribe
-            model: The STT model to use for transcription
         Returns:
-            TextContent: The transcribed text
         Raises:
-            SpeechRecognitionException: If transcription fails
         """
         pass

+"""Speech recognition service interface.
+This module defines the interface for speech-to-text (STT) services that convert
+audio content into textual representation. The interface supports multiple STT
+models and providers with consistent error handling.
+The interface is designed to be:
+- Model-agnostic: Works with any STT implementation (Whisper, Parakeet, etc.)
+- Language-aware: Handles multiple languages and dialects
+- Error-resilient: Provides detailed error information for debugging
+- Performance-conscious: Supports both batch and streaming transcription
+"""
 from abc import ABC, abstractmethod
 from typing import TYPE_CHECKING
 class ISpeechRecognitionService(ABC):
+    """Interface for speech recognition services.
+    This interface defines the contract for converting audio content to text
+    using various STT models and providers. Implementations should handle
+    different audio formats, languages, and quality levels.
+    Example:
+        ```python
+        # Use through dependency injection
+        stt_service = container.resolve(ISpeechRecognitionService)
+        # Transcribe audio
+        text_result = stt_service.transcribe(
+            audio=audio_content,
+            model="whisper-large"
+        )
+        print(f"Transcribed: {text_result.text}")
+        print(f"Language: {text_result.language}")
+        print(f"Confidence: {text_result.confidence}")
+        ```
+    """
     @abstractmethod
     def transcribe(self, audio: 'AudioContent', model: str) -> 'TextContent':
+        """Transcribe audio content to text using specified STT model.
+        Converts audio data into textual representation with language detection
+        and confidence scoring. The method should handle various audio formats
+        and quality levels gracefully.
+        Implementation considerations:
+        - Audio preprocessing (noise reduction, normalization)
+        - Language detection and handling
+        - Confidence scoring and quality assessment
+        - Memory management for large audio files
+        - Timeout handling for long audio content
         Args:
+            audio: The audio content to transcribe. Must contain valid audio data
+                  in a supported format (WAV, MP3, FLAC, etc.) with appropriate
+                  sample rate and duration.
+            model: The STT model identifier to use for transcription. Examples:
+                  - "whisper-small": Fast, lower accuracy
+                  - "whisper-large": Slower, higher accuracy
+                  - "parakeet": Real-time optimized
+                  Must be supported by the implementation.
         Returns:
+            TextContent: The transcription result containing:
+                - text: The transcribed text content
+                - language: Detected or specified language code
+                - confidence: Overall transcription confidence (0.0-1.0)
+                - metadata: Additional information like word-level timestamps,
+                          alternative transcriptions, processing time
         Raises:
+            SpeechRecognitionException: If transcription fails due to:
+                - Unsupported audio format or quality
+                - Model loading or inference errors
+                - Network issues (for cloud-based models)
+                - Insufficient system resources
+            ValueError: If input parameters are invalid:
+                - Empty or corrupted audio data
+                - Unsupported model identifier
+                - Invalid audio format specifications
+        Example:
+            ```python
+            # Load audio file
+            with open("speech.wav", "rb") as f:
+                audio = AudioContent(
+                    data=f.read(),
+                    format="wav",
+                    sample_rate=16000,
+                    duration=30.0
+                )
+            # Transcribe with high-accuracy model
+            try:
+                result = service.transcribe(audio, "whisper-large")
+                if result.confidence > 0.8:
+                    print(f"High confidence: {result.text}")
+                else:
+                    print(f"Low confidence: {result.text} ({result.confidence:.2f})")
+            except SpeechRecognitionException as e:
+                print(f"Transcription failed: {e}")
+            ```
         """
         pass

src/domain/interfaces/speech_synthesis.py CHANGED Viewed

@@ -1,4 +1,15 @@
-"""Speech synthesis service interface."""
 from abc import ABC, abstractmethod
 from typing import Iterator, TYPE_CHECKING
@@ -10,36 +21,175 @@ if TYPE_CHECKING:
 class ISpeechSynthesisService(ABC):
-    """Interface for speech synthesis services."""
     @abstractmethod
     def synthesize(self, request: 'SpeechSynthesisRequest') -> 'AudioContent':
-        """
-        Synthesize speech from text.
         Args:
-            request: The speech synthesis request containing text and voice settings
         Returns:
-            AudioContent: The synthesized audio
         Raises:
-            SpeechSynthesisException: If synthesis fails
         """
         pass
     @abstractmethod
     def synthesize_stream(self, request: 'SpeechSynthesisRequest') -> Iterator['AudioChunk']:
-        """
-        Synthesize speech from text as a stream.
         Args:
-            request: The speech synthesis request containing text and voice settings
-        Returns:
-            Iterator[AudioChunk]: Stream of audio chunks
         Raises:
-            SpeechSynthesisException: If synthesis fails
         """
         pass

+"""Speech synthesis service interface.
+This module defines the interface for text-to-speech (TTS) services that convert
+textual content into audio. The interface supports both batch and streaming
+synthesis with multiple voice options and quality settings.
+The interface is designed to be:
+- Voice-flexible: Supports multiple voices and languages
+- Quality-configurable: Allows control over synthesis parameters
+- Streaming-capable: Supports real-time audio generation
+- Provider-agnostic: Works with any TTS implementation
+"""
 from abc import ABC, abstractmethod
 from typing import Iterator, TYPE_CHECKING
 class ISpeechSynthesisService(ABC):
+    """Interface for speech synthesis services.
+    This interface defines the contract for converting text to speech using
+    various TTS models and voices. Implementations should support both batch
+    processing and streaming synthesis for different use cases.
+    Example:
+        ```python
+        # Use through dependency injection
+        tts_service = container.resolve(ISpeechSynthesisService)
+        # Create synthesis request
+        request = SpeechSynthesisRequest(
+            text_content=text_content,
+            voice_settings=voice_settings
+        )
+        # Batch synthesis
+        audio = tts_service.synthesize(request)
+        # Or streaming synthesis
+        for chunk in tts_service.synthesize_stream(request):
+            # Process audio chunk in real-time
+            play_audio_chunk(chunk)
+        ```
+    """
     @abstractmethod
     def synthesize(self, request: 'SpeechSynthesisRequest') -> 'AudioContent':
+        """Synthesize speech from text in batch mode.
+        Converts text content to audio using specified voice settings and
+        returns the complete audio content. This method is suitable for
+        shorter texts or when the complete audio is needed before playback.
+        Implementation considerations:
+        - Text preprocessing (SSML support, pronunciation handling)
+        - Voice loading and configuration
+        - Audio quality optimization
+        - Memory management for long texts
+        - Error recovery and fallback voices
         Args:
+            request: The speech synthesis request containing:
+                - text_content: Text to synthesize with language information
+                - voice_settings: Voice configuration including voice ID, speed,
+                  pitch, volume, and other voice-specific parameters
         Returns:
+            AudioContent: The synthesized audio containing:
+                - data: Raw audio data in specified format
+                - format: Audio format (WAV, MP3, etc.)
+                - sample_rate: Audio sample rate in Hz
+                - duration: Audio duration in seconds
+                - metadata: Additional synthesis information
         Raises:
+            SpeechSynthesisException: If synthesis fails due to:
+                - Unsupported voice or language
+                - Text processing errors (invalid characters, length limits)
+                - Voice model loading failures
+                - Insufficient system resources
+            ValueError: If request parameters are invalid:
+                - Empty text content
+                - Unsupported voice settings
+                - Invalid audio format specifications
+        Example:
+            ```python
+            # Create text content
+            text = TextContent(
+                text="Hello, this is a test of speech synthesis.",
+                language="en"
+            )
+            # Configure voice settings
+            voice_settings = VoiceSettings(
+                voice_id="kokoro",
+                speed=1.0,
+                pitch=0.0,
+                volume=1.0
+            )
+            # Create synthesis request
+            request = SpeechSynthesisRequest(
+                text_content=text,
+                voice_settings=voice_settings
+            )
+            # Synthesize audio
+            try:
+                audio = service.synthesize(request)
+                # Save to file
+                with open("output.wav", "wb") as f:
+                    f.write(audio.data)
+                print(f"Generated {audio.duration:.1f}s of audio")
+            except SpeechSynthesisException as e:
+                print(f"Synthesis failed: {e}")
+            ```
         """
         pass
     @abstractmethod
     def synthesize_stream(self, request: 'SpeechSynthesisRequest') -> Iterator['AudioChunk']:
+        """Synthesize speech from text as a stream of audio chunks.
+        Converts text content to audio in streaming mode, yielding audio chunks
+        as they become available. This method is suitable for real-time playback,
+        long texts, or when low latency is required.
+        Implementation considerations:
+        - Chunk size optimization for smooth playback
+        - Buffer management and memory efficiency
+        - Error handling without breaking the stream
+        - Proper stream termination and cleanup
+        - Latency minimization for real-time use cases
         Args:
+            request: The speech synthesis request containing text and voice settings.
+                    Same format as batch synthesis but optimized for streaming.
+        Yields:
+            AudioChunk: Individual audio chunks containing:
+                - data: Raw audio data for this chunk
+                - format: Audio format (consistent across chunks)
+                - sample_rate: Audio sample rate in Hz
+                - chunk_index: Sequential chunk number
+                - is_final: Boolean indicating if this is the last chunk
+                - timestamp: Chunk generation timestamp
         Raises:
+            SpeechSynthesisException: If synthesis fails during streaming:
+                - Voice model errors during processing
+                - Network issues (for cloud-based synthesis)
+                - Resource exhaustion during long synthesis
+            ValueError: If request parameters are invalid for streaming
+        Example:
+            ```python
+            # Create streaming synthesis request
+            request = SpeechSynthesisRequest(
+                text_content=long_text,
+                voice_settings=voice_settings
+            )
+            # Stream synthesis with real-time playback
+            audio_buffer = []
+            try:
+                for chunk in service.synthesize_stream(request):
+                    # Add to playback buffer
+                    audio_buffer.append(chunk.data)
+                    # Start playback when buffer is sufficient
+                    if len(audio_buffer) >= 3:  # Buffer 3 chunks
+                        play_audio_chunk(audio_buffer.pop(0))
+                    # Handle final chunk
+                    if chunk.is_final:
+                        # Play remaining buffered chunks
+                        for remaining in audio_buffer:
+                            play_audio_chunk(remaining)
+                        break
+            except SpeechSynthesisException as e:
+                print(f"Streaming synthesis failed: {e}")
+            ```
         """
         pass

src/domain/interfaces/translation.py CHANGED Viewed

@@ -1,4 +1,15 @@
-"""Translation service interface."""
 from abc import ABC, abstractmethod
 from typing import TYPE_CHECKING
@@ -9,20 +20,116 @@ if TYPE_CHECKING:
 class ITranslationService(ABC):
-    """Interface for translation services."""
     @abstractmethod
     def translate(self, request: 'TranslationRequest') -> 'TextContent':
-        """
-        Translate text from source language to target language.
         Args:
-            request: The translation request containing text and language info
         Returns:
-            TextContent: The translated text
         Raises:
-            TranslationFailedException: If translation fails
         """
         pass

+"""Translation service interface.
+This module defines the interface for text translation services that convert
+text from one language to another. The interface supports multiple translation
+providers and models with language detection and quality assessment.
+The interface is designed to be:
+- Provider-agnostic: Works with any translation implementation
+- Language-flexible: Supports automatic language detection
+- Quality-aware: Provides confidence scores and alternative translations
+- Context-sensitive: Handles domain-specific translation needs
+"""
 from abc import ABC, abstractmethod
 from typing import TYPE_CHECKING
 class ITranslationService(ABC):
+    """Interface for translation services.
+    This interface defines the contract for translating text between languages
+    using various translation models and providers. Implementations should
+    handle language detection, context preservation, and quality optimization.
+    Example:
+        ```python
+        # Use through dependency injection
+        translation_service = container.resolve(ITranslationService)
+        # Create translation request
+        request = TranslationRequest(
+            text_content=source_text,
+            target_language="zh",
+            source_language="en"  # Optional, can be auto-detected
+        )
+        # Translate text
+        result = translation_service.translate(request)
+        print(f"Original: {request.text_content.text}")
+        print(f"Translated: {result.text}")
+        print(f"Confidence: {result.confidence}")
+        ```
+    """
     @abstractmethod
     def translate(self, request: 'TranslationRequest') -> 'TextContent':
+        """Translate text from source language to target language.
+        Converts text content from one language to another while preserving
+        meaning, context, and formatting where possible. The method should
+        handle language detection, domain adaptation, and quality assessment.
+        Implementation considerations:
+        - Language detection and validation
+        - Context preservation and domain adaptation
+        - Handling of special characters and formatting
+        - Quality assessment and confidence scoring
+        - Caching for repeated translations
+        - Fallback mechanisms for unsupported language pairs
         Args:
+            request: The translation request containing:
+                - text_content: Source text with language information
+                - target_language: Target language code (ISO 639-1)
+                - source_language: Source language code (optional, can be auto-detected)
+                - context: Optional context information for better translation
+                - domain: Optional domain specification (technical, medical, etc.)
         Returns:
+            TextContent: The translated text containing:
+                - text: Translated text content
+                - language: Target language code
+                - confidence: Translation confidence score (0.0-1.0)
+                - metadata: Additional information including:
+                  - detected_source_language: Auto-detected source language
+                  - alternative_translations: Other possible translations
+                  - processing_time: Translation duration
+                  - model_used: Translation model identifier
         Raises:
+            TranslationFailedException: If translation fails due to:
+                - Unsupported language pair
+                - Model loading or inference errors
+                - Network issues (for cloud-based translation)
+                - Text processing errors (encoding, length limits)
+            ValueError: If request parameters are invalid:
+                - Empty text content
+                - Invalid language codes
+                - Unsupported translation options
+        Example:
+            ```python
+            # Create source text
+            source_text = TextContent(
+                text="Hello, how are you today?",
+                language="en"
+            )
+            # Create translation request
+            request = TranslationRequest(
+                text_content=source_text,
+                target_language="zh",
+                context="casual_conversation"
+            )
+            # Perform translation
+            try:
+                result = service.translate(request)
+                print(f"Original ({source_text.language}): {source_text.text}")
+                print(f"Translated ({result.language}): {result.text}")
+                if result.confidence > 0.9:
+                    print("High confidence translation")
+                elif result.confidence > 0.7:
+                    print("Medium confidence translation")
+                else:
+                    print("Low confidence translation - review recommended")
+                # Check for alternatives
+                if hasattr(result, 'metadata') and 'alternatives' in result.metadata:
+                    print("Alternative translations:")
+                    for alt in result.metadata['alternatives'][:3]:
+                        print(f"  - {alt}")
+            except TranslationFailedException as e:
+                print(f"Translation failed: {e}")
+            ```
         """
         pass

src/infrastructure/base/tts_provider_base.py CHANGED Viewed

@@ -1,4 +1,37 @@
-"""Base class for TTS provider implementations."""
 import logging
 import os
@@ -20,15 +53,57 @@ logger = logging.getLogger(__name__)
 class TTSProviderBase(ISpeechSynthesisService, ABC):
-    """Abstract base class for TTS provider implementations."""
     def __init__(self, provider_name: str, supported_languages: list[str] = None):
         """
         Initialize the TTS provider.
         Args:
-            provider_name: Name of the TTS provider
-            supported_languages: List of supported language codes
         """
         self.provider_name = provider_name
         self.supported_languages = supported_languages or []

+"""
+Base class for TTS provider implementations.
+This module provides the abstract base class for all Text-to-Speech provider
+implementations in the infrastructure layer. It implements common functionality
+and defines the contract that all TTS providers must follow.
+The base class handles:
+- Common validation logic
+- File management and cleanup
+- Error handling and logging
+- Audio format processing
+- Provider lifecycle management
+Example implementation:
+    ```python
+    from src.infrastructure.base.tts_provider_base import TTSProviderBase
+    class MyTTSProvider(TTSProviderBase):
+        def __init__(self):
+            super().__init__("my_tts", ["en", "es"])
+        def _generate_audio(self, request):
+            # Implement TTS-specific logic
+            audio_data = my_tts_engine.synthesize(request.text_content.text)
+            return audio_data, 22050  # audio_bytes, sample_rate
+        def is_available(self):
+            return my_tts_engine.is_loaded()
+        def get_available_voices(self):
+            return ["voice1", "voice2"]
+    ```
+"""
 import logging
 import os
 class TTSProviderBase(ISpeechSynthesisService, ABC):
+    """
+    Abstract base class for TTS provider implementations.
+    This class provides a foundation for implementing Text-to-Speech providers
+    in the infrastructure layer. It handles common concerns like validation,
+    file management, error handling, and audio processing while allowing
+    concrete implementations to focus on provider-specific logic.
+    Key features:
+    - Automatic validation of synthesis requests
+    - Temporary file management with cleanup
+    - Consistent error handling and logging
+    - Support for both batch and streaming synthesis
+    - Audio format standardization
+    - Provider availability checking
+    Subclasses must implement:
+    - _generate_audio(): Core synthesis logic
+    - _generate_audio_stream(): Streaming synthesis (optional)
+    - is_available(): Provider availability check
+    - get_available_voices(): Voice enumeration
+    The base class ensures that all providers follow the same patterns
+    for error handling, logging, and resource management, making the
+    system more maintainable and predictable.
+    """
     def __init__(self, provider_name: str, supported_languages: list[str] = None):
         """
         Initialize the TTS provider.
+        Sets up the provider with basic configuration and creates necessary
+        directories for temporary file storage. This constructor should be
+        called by all subclass implementations.
         Args:
+            provider_name: Unique identifier for this TTS provider (e.g., "kokoro", "dia").
+                         Used for logging, error messages, and provider selection.
+            supported_languages: List of ISO language codes supported by this provider
+                               (e.g., ["en", "zh", "es"]). If None, no language validation
+                               will be performed.
+        Example:
+            ```python
+            class MyTTSProvider(TTSProviderBase):
+                def __init__(self):
+                    super().__init__(
+                        provider_name="my_tts",
+                        supported_languages=["en", "es", "fr"]
+                    )
+            ```
         """
         self.provider_name = provider_name
         self.supported_languages = supported_languages or []