anfastech commited on
Commit
74a089b
Β·
1 Parent(s): e7e9fa8

Updation: ML/AI logic is now in the AI engine service

Browse files
ARCHITECTURE.md ADDED
@@ -0,0 +1,185 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # AI Engine Architecture
2
+
3
+ ## Clean Architecture Implementation
4
+
5
+ This AI engine follows clean architecture principles with proper separation of concerns.
6
+
7
+ ---
8
+
9
+ ## Module Structure
10
+
11
+ ```
12
+ diagnosis/ai_engine/
13
+ β”œβ”€β”€ detect_stuttering.py # Main detector class (business logic)
14
+ β”œβ”€β”€ model_loader.py # Singleton pattern for model loading
15
+ └── features.py # Feature extraction (ASR features)
16
+ ```
17
+
18
+ ---
19
+
20
+ ## Architecture Pattern
21
+
22
+ ### 1. Model Loader (`model_loader.py`)
23
+ **Responsibility**: Singleton pattern for model instance management
24
+
25
+ - Ensures models are loaded only once
26
+ - Provides clean interface: `get_stutter_detector()`
27
+ - Handles initialization and error handling
28
+ - Used by API layer (`app.py`)
29
+
30
+ **Usage:**
31
+ ```python
32
+ from diagnosis.ai_engine.model_loader import get_stutter_detector
33
+
34
+ detector = get_stutter_detector() # Singleton instance
35
+ ```
36
+
37
+ ---
38
+
39
+ ### 2. Feature Extractor (`features.py`)
40
+ **Responsibility**: Feature extraction from audio using IndicWav2Vec Hindi
41
+
42
+ **Class**: `ASRFeatureExtractor`
43
+
44
+ **Methods:**
45
+ - `extract_audio_features()` - Raw audio feature extraction
46
+ - `get_transcription_features()` - Transcription with confidence scores
47
+ - `get_word_level_features()` - Word-level timestamps and confidence
48
+
49
+ **Design Pattern**:
50
+ - Takes pre-loaded model and processor as dependencies
51
+ - Single responsibility: feature extraction only
52
+ - Reusable across different use cases
53
+
54
+ **Usage:**
55
+ ```python
56
+ from .features import ASRFeatureExtractor
57
+
58
+ extractor = ASRFeatureExtractor(model, processor, device)
59
+ features = extractor.get_transcription_features(audio)
60
+ ```
61
+
62
+ ---
63
+
64
+ ### 3. Detector (`detect_stuttering.py`)
65
+ **Responsibility**: High-level stutter detection orchestration
66
+
67
+ **Class**: `AdvancedStutterDetector`
68
+
69
+ **Design:**
70
+ - Uses feature extractor for transcription (composition)
71
+ - Orchestrates the analysis pipeline
72
+ - Returns structured results
73
+
74
+ **Flow:**
75
+ ```
76
+ Audio Input
77
+ ↓
78
+ Feature Extractor (ASR)
79
+ ↓
80
+ Text Analysis
81
+ ↓
82
+ Results
83
+ ```
84
+
85
+ ---
86
+
87
+ ## Benefits of This Architecture
88
+
89
+ ### βœ… Separation of Concerns
90
+ - **Model Loading**: Isolated in `model_loader.py`
91
+ - **Feature Extraction**: Isolated in `features.py`
92
+ - **Business Logic**: In `detect_stuttering.py`
93
+
94
+ ### βœ… Single Responsibility Principle
95
+ - Each module has one clear purpose
96
+ - Easy to test and maintain
97
+ - Easy to extend or replace components
98
+
99
+ ### βœ… Dependency Injection
100
+ - Feature extractor receives model/processor as dependencies
101
+ - No tight coupling
102
+ - Easy to mock for testing
103
+
104
+ ### βœ… Reusability
105
+ - Feature extractor can be used independently
106
+ - Model loader can be used by other modules
107
+ - Clean interfaces between layers
108
+
109
+ ---
110
+
111
+ ## Data Flow
112
+
113
+ ```
114
+ API Request (app.py)
115
+ ↓
116
+ get_stutter_detector() [model_loader.py]
117
+ ↓
118
+ AdvancedStutterDetector [detect_stuttering.py]
119
+ ↓
120
+ ASRFeatureExtractor [features.py]
121
+ ↓
122
+ IndicWav2Vec Hindi Model
123
+ ↓
124
+ Results back through layers
125
+ ```
126
+
127
+ ---
128
+
129
+ ## Comparison with Django App
130
+
131
+ **Before (Django App):**
132
+ - Model loading logic in Django app
133
+ - Feature extraction in Django app
134
+ - Tight coupling between web app and ML logic
135
+
136
+ **After (AI Engine Service):**
137
+ - βœ… Model loading in AI engine service
138
+ - βœ… Feature extraction in AI engine service
139
+ - βœ… Django app only calls API (loose coupling)
140
+ - βœ… ML logic isolated in dedicated service
141
+
142
+ ---
143
+
144
+ ## Extension Points
145
+
146
+ ### Adding New Features
147
+ 1. Add method to `ASRFeatureExtractor` in `features.py`
148
+ 2. Use in `AdvancedStutterDetector` via composition
149
+ 3. No changes needed to model loader
150
+
151
+ ### Adding New Models
152
+ 1. Update `detect_stuttering.py` to load new model
153
+ 2. Create new feature extractor if needed
154
+ 3. Model loader remains unchanged
155
+
156
+ ### Testing
157
+ - Mock `ASRFeatureExtractor` in tests
158
+ - Mock model loader for integration tests
159
+ - Each component can be tested independently
160
+
161
+ ---
162
+
163
+ ## Key Principles Applied
164
+
165
+ 1. **Dependency Inversion**: High-level modules don't depend on low-level modules
166
+ 2. **Open/Closed**: Open for extension, closed for modification
167
+ 3. **Interface Segregation**: Clean, focused interfaces
168
+ 4. **Don't Repeat Yourself (DRY)**: Feature extraction logic centralized
169
+ 5. **Single Source of Truth**: Model instance managed by singleton
170
+
171
+ ---
172
+
173
+ ## File Responsibilities
174
+
175
+ | File | Responsibility | Depends On |
176
+ |------|---------------|------------|
177
+ | `model_loader.py` | Singleton model management | `detect_stuttering.py` |
178
+ | `features.py` | Feature extraction | `transformers`, `torch` |
179
+ | `detect_stuttering.py` | Business logic orchestration | `features.py`, `model_loader.py` |
180
+ | `app.py` | API layer | `model_loader.py` |
181
+
182
+ ---
183
+
184
+ This architecture ensures the ML/AI logic stays in the AI engine service, not in the Django web application, following microservices best practices.
185
+
app.py CHANGED
@@ -18,12 +18,12 @@ logger = logging.getLogger(__name__)
18
  # Add project root to path
19
  sys.path.insert(0, str(Path(__file__).parent))
20
 
21
- # Import detector
22
  try:
23
- from diagnosis.ai_engine.detect_stuttering import get_stutter_detector
24
- logger.info("βœ… Successfully imported StutterDetector")
25
  except ImportError as e:
26
- logger.error(f"❌ Failed to import StutterDetector: {e}")
27
  raise
28
 
29
  # Initialize FastAPI
 
18
  # Add project root to path
19
  sys.path.insert(0, str(Path(__file__).parent))
20
 
21
+ # Import detector using model loader (clean architecture)
22
  try:
23
+ from diagnosis.ai_engine.model_loader import get_stutter_detector
24
+ logger.info("βœ… Successfully imported model loader")
25
  except ImportError as e:
26
+ logger.error(f"❌ Failed to import model loader: {e}")
27
  raise
28
 
29
  # Initialize FastAPI
diagnosis/ai_engine/detect_stuttering.py CHANGED
@@ -107,6 +107,14 @@ class AdvancedStutterDetector:
107
  ).to(DEVICE)
108
  self.model.eval()
109
 
 
 
 
 
 
 
 
 
110
  # Debug: Log processor structure
111
  logger.info(f"πŸ“‹ Processor type: {type(self.processor)}")
112
  if hasattr(self.processor, 'tokenizer'):
@@ -114,7 +122,7 @@ class AdvancedStutterDetector:
114
  if hasattr(self.processor, 'feature_extractor'):
115
  logger.info(f"πŸ“‹ Feature extractor type: {type(self.processor.feature_extractor)}")
116
 
117
- logger.info("βœ… IndicWav2Vec Hindi ASR Engine Loaded")
118
  except Exception as e:
119
  logger.error(f"πŸ”₯ Engine Failure: {e}")
120
  raise
@@ -236,71 +244,22 @@ class AdvancedStutterDetector:
236
  return features
237
 
238
  def _transcribe_with_timestamps(self, audio: np.ndarray) -> Tuple[str, List[Dict], torch.Tensor]:
239
- """Transcribe audio and return word timestamps and logits"""
 
 
 
 
240
  try:
241
- inputs = self.processor(audio, sampling_rate=16000, return_tensors="pt").to(DEVICE)
 
 
 
242
 
243
- with torch.no_grad():
244
- outputs = self.model(**inputs)
245
- logits = outputs.logits
246
- predicted_ids = torch.argmax(logits, dim=-1)
247
 
248
- # Decode transcript - IndicWav2Vec uses tokenizer for decoding
249
- transcript = ""
250
- try:
251
- # Method 1: Try using processor's tokenizer directly
252
- if hasattr(self.processor, 'tokenizer'):
253
- transcript = self.processor.tokenizer.decode(predicted_ids[0], skip_special_tokens=True)
254
- logger.info(f"πŸ“ Decoded via tokenizer: '{transcript}' (length: {len(transcript)})")
255
- # Method 2: Try batch_decode if tokenizer not available
256
- elif hasattr(self.processor, 'batch_decode'):
257
- transcript = self.processor.batch_decode(predicted_ids)[0]
258
- logger.info(f"πŸ“ Decoded via batch_decode: '{transcript}' (length: {len(transcript)})")
259
- # Method 3: Try accessing tokenizer through processor.feature_extractor or processor attributes
260
- else:
261
- # Check if processor wraps a tokenizer
262
- for attr in ['tokenizer', '_tokenizer', 'decoder']:
263
- if hasattr(self.processor, attr):
264
- tokenizer = getattr(self.processor, attr)
265
- if hasattr(tokenizer, 'decode'):
266
- transcript = tokenizer.decode(predicted_ids[0], skip_special_tokens=True)
267
- logger.info(f"πŸ“ Decoded via {attr}: '{transcript}' (length: {len(transcript)})")
268
- break
269
-
270
- # Clean up transcript - remove special tokens and normalize
271
- if transcript:
272
- transcript = transcript.strip()
273
- # Remove common special tokens if present
274
- transcript = transcript.replace('<pad>', '').replace('<s>', '').replace('</s>', '').replace('|', ' ').strip()
275
- # Normalize whitespace
276
- transcript = ' '.join(transcript.split())
277
-
278
- except Exception as decode_error:
279
- logger.error(f"⚠️ Decode error: {decode_error}", exc_info=True)
280
- transcript = ""
281
-
282
- # Ensure transcript is not None
283
- if not transcript:
284
- transcript = ""
285
- logger.warning("⚠️ Empty transcript generated - model may not have produced valid output")
286
- logger.warning(f"⚠️ Predicted IDs shape: {predicted_ids.shape}, sample values: {predicted_ids[0][:10].tolist() if predicted_ids.numel() > 0 else 'empty'}")
287
-
288
- # Estimate word timestamps (simplified - frame-level alignment)
289
- frame_duration = 0.02 # 20ms per frame
290
- num_frames = logits.shape[1]
291
- audio_duration = len(audio) / 16000
292
-
293
- # Simple word-level timestamps (would need proper alignment for production)
294
- words = transcript.split() if transcript else []
295
- word_timestamps = []
296
- time_per_word = audio_duration / max(len(words), 1) if words else 0
297
-
298
- for i, word in enumerate(words):
299
- word_timestamps.append({
300
- 'word': word,
301
- 'start': i * time_per_word,
302
- 'end': (i + 1) * time_per_word
303
- })
304
 
305
  return transcript, word_timestamps, logits
306
  except Exception as e:
@@ -860,23 +819,6 @@ class AdvancedStutterDetector:
860
  return round(min(max(confidence, 0.0), 1.0), 2)
861
 
862
 
863
- # diagnosis/ai_engine/model_loader.py
864
- """Singleton pattern for model loading"""
865
- _detector_instance = None
866
-
867
- def get_stutter_detector():
868
- """Get or create singleton AdvancedStutterDetector instance"""
869
- global _detector_instance
870
- if _detector_instance is None:
871
- _detector_instance = AdvancedStutterDetector()
872
- return _detector_instance
873
-
874
- # Singleton pattern for model loading
875
- _detector_instance = None
876
-
877
- def get_stutter_detector():
878
- """Get or create singleton AdvancedStutterDetector instance"""
879
- global _detector_instance
880
- if _detector_instance is None:
881
- _detector_instance = AdvancedStutterDetector()
882
- return _detector_instance
 
107
  ).to(DEVICE)
108
  self.model.eval()
109
 
110
+ # Initialize feature extractor (clean architecture pattern)
111
+ from .features import ASRFeatureExtractor
112
+ self.feature_extractor = ASRFeatureExtractor(
113
+ model=self.model,
114
+ processor=self.processor,
115
+ device=DEVICE
116
+ )
117
+
118
  # Debug: Log processor structure
119
  logger.info(f"πŸ“‹ Processor type: {type(self.processor)}")
120
  if hasattr(self.processor, 'tokenizer'):
 
122
  if hasattr(self.processor, 'feature_extractor'):
123
  logger.info(f"πŸ“‹ Feature extractor type: {type(self.processor.feature_extractor)}")
124
 
125
+ logger.info("βœ… IndicWav2Vec Hindi ASR Engine Loaded with Feature Extractor")
126
  except Exception as e:
127
  logger.error(f"πŸ”₯ Engine Failure: {e}")
128
  raise
 
244
  return features
245
 
246
  def _transcribe_with_timestamps(self, audio: np.ndarray) -> Tuple[str, List[Dict], torch.Tensor]:
247
+ """
248
+ Transcribe audio and return word timestamps and logits.
249
+
250
+ Uses the feature extractor for clean separation of concerns.
251
+ """
252
  try:
253
+ # Use feature extractor for transcription (clean architecture)
254
+ features = self.feature_extractor.get_transcription_features(audio, sample_rate=16000)
255
+ transcript = features['transcript']
256
+ logits = torch.from_numpy(features['logits'])
257
 
258
+ # Get word-level features for timestamps
259
+ word_features = self.feature_extractor.get_word_level_features(audio, sample_rate=16000)
260
+ word_timestamps = word_features['word_timestamps']
 
261
 
262
+ logger.info(f"πŸ“ Transcription via feature extractor: '{transcript}' (length: {len(transcript)}, words: {len(word_timestamps)})")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
263
 
264
  return transcript, word_timestamps, logits
265
  except Exception as e:
 
819
  return round(min(max(confidence, 0.0), 1.0), 2)
820
 
821
 
822
+ # Model loader is now in a separate module: model_loader.py
823
+ # This follows clean architecture principles - separation of concerns
824
+ # Import using: from diagnosis.ai_engine.model_loader import get_stutter_detector
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
diagnosis/ai_engine/features.py ADDED
@@ -0,0 +1,206 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # diagnosis/ai_engine/features.py
2
+ """
3
+ Feature extraction for IndicWav2Vec Hindi ASR
4
+
5
+ This module provides feature extraction capabilities using the IndicWav2Vec Hindi model.
6
+ Focused on ASR transcription features rather than hybrid acoustic+linguistic features.
7
+ """
8
+ import torch
9
+ import numpy as np
10
+ import logging
11
+ from typing import Dict, Any, Tuple, Optional
12
+ from transformers import Wav2Vec2ForCTC, AutoProcessor
13
+
14
+ logger = logging.getLogger(__name__)
15
+
16
+
17
+ class ASRFeatureExtractor:
18
+ """
19
+ Feature extractor using IndicWav2Vec Hindi for Automatic Speech Recognition.
20
+
21
+ This extractor focuses on:
22
+ - Audio feature extraction via IndicWav2Vec
23
+ - Transcription confidence scores
24
+ - Frame-level predictions and logits
25
+ - Word-level alignments (estimated)
26
+
27
+ Model: ai4bharat/indicwav2vec-hindi
28
+ """
29
+
30
+ def __init__(self, model: Wav2Vec2ForCTC, processor: AutoProcessor, device: str = "cpu"):
31
+ """
32
+ Initialize the ASR feature extractor.
33
+
34
+ Args:
35
+ model: Pre-loaded IndicWav2Vec Hindi model
36
+ processor: Pre-loaded processor for the model
37
+ device: Device to run inference on ('cpu' or 'cuda')
38
+ """
39
+ self.model = model
40
+ self.processor = processor
41
+ self.device = device
42
+ self.model.eval()
43
+ logger.info(f"βœ… ASRFeatureExtractor initialized on {device}")
44
+
45
+ def extract_audio_features(self, audio: np.ndarray, sample_rate: int = 16000) -> Dict[str, Any]:
46
+ """
47
+ Extract features from audio using IndicWav2Vec Hindi.
48
+
49
+ Args:
50
+ audio: Audio waveform as numpy array
51
+ sample_rate: Sample rate of the audio (default: 16000)
52
+
53
+ Returns:
54
+ Dictionary containing:
55
+ - input_values: Processed audio features
56
+ - attention_mask: Attention mask (if available)
57
+ """
58
+ try:
59
+ # Process audio through the processor
60
+ inputs = self.processor(
61
+ audio,
62
+ sampling_rate=sample_rate,
63
+ return_tensors="pt"
64
+ ).to(self.device)
65
+
66
+ return {
67
+ 'input_values': inputs.input_values,
68
+ 'attention_mask': inputs.get('attention_mask', None)
69
+ }
70
+ except Exception as e:
71
+ logger.error(f"❌ Error extracting audio features: {e}")
72
+ raise
73
+
74
+ def get_transcription_features(
75
+ self,
76
+ audio: np.ndarray,
77
+ sample_rate: int = 16000
78
+ ) -> Dict[str, Any]:
79
+ """
80
+ Get transcription features including logits, predictions, and confidence.
81
+
82
+ Args:
83
+ audio: Audio waveform as numpy array
84
+ sample_rate: Sample rate of the audio (default: 16000)
85
+
86
+ Returns:
87
+ Dictionary containing:
88
+ - transcript: Transcribed text
89
+ - logits: Model logits (raw predictions)
90
+ - predicted_ids: Predicted token IDs
91
+ - probabilities: Softmax probabilities
92
+ - confidence: Average confidence score
93
+ - frame_confidence: Per-frame confidence scores
94
+ """
95
+ try:
96
+ # Process audio
97
+ inputs = self.processor(
98
+ audio,
99
+ sampling_rate=sample_rate,
100
+ return_tensors="pt"
101
+ ).to(self.device)
102
+
103
+ # Get model predictions
104
+ with torch.no_grad():
105
+ outputs = self.model(**inputs)
106
+ logits = outputs.logits
107
+ predicted_ids = torch.argmax(logits, dim=-1)
108
+
109
+ # Calculate probabilities and confidence
110
+ probs = torch.softmax(logits, dim=-1)
111
+ max_probs = torch.max(probs, dim=-1)[0] # Get max probability per frame
112
+ frame_confidence = max_probs[0].cpu().numpy()
113
+ avg_confidence = float(torch.mean(max_probs).item())
114
+
115
+ # Decode transcript
116
+ transcript = ""
117
+ try:
118
+ if hasattr(self.processor, 'tokenizer'):
119
+ transcript = self.processor.tokenizer.decode(
120
+ predicted_ids[0],
121
+ skip_special_tokens=True
122
+ )
123
+ elif hasattr(self.processor, 'batch_decode'):
124
+ transcript = self.processor.batch_decode(predicted_ids)[0]
125
+
126
+ # Clean up transcript
127
+ if transcript:
128
+ transcript = transcript.strip()
129
+ transcript = transcript.replace('<pad>', '').replace('<s>', '').replace('</s>', '').replace('|', ' ').strip()
130
+ transcript = ' '.join(transcript.split())
131
+ except Exception as e:
132
+ logger.warning(f"⚠️ Decode error: {e}")
133
+ transcript = ""
134
+
135
+ return {
136
+ 'transcript': transcript,
137
+ 'logits': logits.cpu().numpy(),
138
+ 'predicted_ids': predicted_ids.cpu().numpy(),
139
+ 'probabilities': probs.cpu().numpy(),
140
+ 'confidence': avg_confidence,
141
+ 'frame_confidence': frame_confidence,
142
+ 'num_frames': logits.shape[1]
143
+ }
144
+ except Exception as e:
145
+ logger.error(f"❌ Error getting transcription features: {e}")
146
+ raise
147
+
148
+ def get_word_level_features(
149
+ self,
150
+ audio: np.ndarray,
151
+ sample_rate: int = 16000
152
+ ) -> Dict[str, Any]:
153
+ """
154
+ Get word-level features including timestamps and confidence.
155
+
156
+ Args:
157
+ audio: Audio waveform as numpy array
158
+ sample_rate: Sample rate of the audio (default: 16000)
159
+
160
+ Returns:
161
+ Dictionary containing:
162
+ - words: List of words
163
+ - word_timestamps: List of (start, end) timestamps for each word
164
+ - word_confidence: Confidence score for each word
165
+ """
166
+ try:
167
+ # Get transcription features
168
+ features = self.get_transcription_features(audio, sample_rate)
169
+ transcript = features['transcript']
170
+ frame_confidence = features['frame_confidence']
171
+ num_frames = features['num_frames']
172
+
173
+ # Estimate word-level timestamps (simplified)
174
+ words = transcript.split() if transcript else []
175
+ audio_duration = len(audio) / sample_rate
176
+ time_per_word = audio_duration / max(len(words), 1) if words else 0
177
+
178
+ word_timestamps = []
179
+ word_confidence = []
180
+
181
+ for i, word in enumerate(words):
182
+ start_time = i * time_per_word
183
+ end_time = (i + 1) * time_per_word
184
+
185
+ # Estimate confidence for this word (average of corresponding frames)
186
+ start_frame = int((start_time / audio_duration) * num_frames)
187
+ end_frame = int((end_time / audio_duration) * num_frames)
188
+ word_conf = float(np.mean(frame_confidence[start_frame:end_frame])) if end_frame > start_frame else 0.5
189
+
190
+ word_timestamps.append({
191
+ 'word': word,
192
+ 'start': start_time,
193
+ 'end': end_time
194
+ })
195
+ word_confidence.append(word_conf)
196
+
197
+ return {
198
+ 'words': words,
199
+ 'word_timestamps': word_timestamps,
200
+ 'word_confidence': word_confidence,
201
+ 'transcript': transcript
202
+ }
203
+ except Exception as e:
204
+ logger.error(f"❌ Error getting word-level features: {e}")
205
+ raise
206
+
diagnosis/ai_engine/model_loader.py ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # diagnosis/ai_engine/model_loader.py
2
+ """Singleton pattern for model loading
3
+
4
+ This loader provides a clean interface for getting the detector instance.
5
+ Uses singleton pattern to ensure models are loaded only once.
6
+ """
7
+ import logging
8
+
9
+ logger = logging.getLogger(__name__)
10
+
11
+ _detector_instance = None
12
+
13
+ def get_stutter_detector():
14
+ """
15
+ Get or create singleton AdvancedStutterDetector instance.
16
+
17
+ This ensures models are loaded only once and reused across requests.
18
+
19
+ Returns:
20
+ AdvancedStutterDetector: The singleton detector instance
21
+
22
+ Raises:
23
+ ImportError: If the detector class cannot be imported
24
+ """
25
+ global _detector_instance
26
+
27
+ if _detector_instance is None:
28
+ try:
29
+ from .detect_stuttering import AdvancedStutterDetector
30
+ logger.info("πŸ”„ Initializing detector instance (first call)...")
31
+ _detector_instance = AdvancedStutterDetector()
32
+ logger.info("βœ… Detector instance created successfully")
33
+ except ImportError as e:
34
+ logger.error(f"❌ Failed to import AdvancedStutterDetector: {e}")
35
+ raise ImportError("No StutterDetector implementation available in detect_stuttering.py") from e
36
+ except Exception as e:
37
+ logger.error(f"❌ Failed to create detector instance: {e}")
38
+ raise
39
+
40
+ return _detector_instance
41
+
42
+ def reset_detector():
43
+ """
44
+ Reset the singleton instance (useful for testing or reloading models).
45
+
46
+ Note: This will force reloading of models on next get_stutter_detector() call.
47
+ """
48
+ global _detector_instance
49
+ _detector_instance = None
50
+ logger.info("πŸ”„ Detector instance reset")
51
+