startelelogic
/

whisper-medium-ccaas

@@ -132,48 +132,51 @@ This model was evaluated using industry-standard transcription metrics focused o
 1. **Word Error Rate (WER)**
-**Performance Summary**
-**Dataset Split**	**WER(Whisper-Medium Base)**	**WER (Fine-Tuned Model)**
-Svarah Test Set	             21.3%	                          13.4%
-Hinglish Subset	             26.8%	                          15.9%
-Regional English
-(Gujarati, Marathi, Telugu)	 27.4%	                          17.2%
-Noisy Telephony Audio	     24.6%	                          14.1%
-**Improvement:**
-The fine-tuned model shows an average **34–42% relative WER reduction**, especially on code-switched (Hinglish) and noisy contact center audio.
 2. **Character Error Rate (CER)**
-**Performance Summary**
-**Dataset Split**	**CER (Base)**	**CER (Fine-Tuned)**
-Svarah Test Set	        13.7%	         8.2%
-Hinglish Subset	        17.4%	         9.9%
-Regional Accents	    18.2%	         11.1%
-Noisy Telephony Audio	16.1%	         9.0%
-The improvements in CER confirm that the fine-tuned model handles accented pronunciation, speech rate variation, and irregular spacing more effectively.
 3. **Qualitative Improvements**
-**Accent Handling**
-- Better recognition of Hindi, Gujarati, Marathi, Telugu, Tamil, and Bengali accents
-- More stable decoding of Indian English pronunciation patterns
-- Reduced errors with long or complex words
-**Code-Switching Performance**
-- Significant improvement in **Hinglish** transcription accuracy
-- Handles fast switching between languages with fewer substitutions
-**Noise Robustness**
-- Improved performance with low-bitrate telephony audio
-- Fewer hallucinations during background noise or overlaps
-- Better segmentation and continuity in long conversations
 4. **Evaluation Methodology**
-Evaluation was performed using:
-- The Svarah dataset test split
-- Additional manually curated Hinglish test samples
-- Noisy, real-world telephony recordings
-- Standard Hugging Face WER/CER evaluation scripts
-Transcriptions from the base Whisper-Medium model were compared directly against the fine-tuned model to measure relative performance gain.
 ## Inference Usage

 1. **Word Error Rate (WER)**
+    **Performance Summary**
+    |**Dataset Split**                                | **WER (Whisper-Medium Base)** | **WER (Fine-Tuned Model)** |
+    | -------------------------------------------- | ------------------------- | ---------------------- |
+    | Svarah Test Set                              | 21.3%                     | 13.4%                  |
+    | Hinglish Subset                              | 26.8%                     | 15.9%                  |
+    | Regional English (Gujarati, Marathi, Telugu) | 27.4%                     | 17.2%                  |
+    | Noisy Telephony Audio                        | 24.6%                     | 14.1%                  |
+    **Improvement:**
+    The fine-tuned model shows an average **34–42% relative WER reduction**, especially on code-switched (Hinglish) and noisy contact center audio.
 2. **Character Error Rate (CER)**
+    **Performance Summary**
+    |   **Dataset Split**     | **CER (Base)** | **CER (Fine-Tuned)** |
+    | --------------------- | ---------- | ---------------- |
+    | Svarah Test Set       | 13.7%      | 8.2%             |
+    | Hinglish Subset       | 17.4%      | 9.9%             |
+    | Regional Accents      | 18.2%      | 11.1%            |
+    | Noisy Telephony Audio | 16.1%      | 9.0%             |
+    The improvements in CER confirm that the fine-tuned model handles accented pronunciation, speech rate variation, and irregular spacing more effectively.
 3. **Qualitative Improvements**
+    **Accent Handling**
+    - Better recognition of Hindi, Gujarati, Marathi, Telugu, Tamil, and Bengali accents
+    - More stable decoding of Indian English pronunciation patterns
+    - Reduced errors with long or complex words
+    **Code-Switching Performance**
+    - Significant improvement in **Hinglish** transcription accuracy
+    - Handles fast switching between languages with fewer substitutions
+    **Noise Robustness**
+    - Improved performance with low-bitrate telephony audio
+    - Fewer hallucinations during background noise or overlaps
+    - Better segmentation and continuity in long conversations
 4. **Evaluation Methodology**
+    **Evaluation was performed using**:
+    - The Svarah dataset test split
+    - Additional manually curated Hinglish test samples
+    - Noisy, real-world telephony recordings
+    - Standard Hugging Face WER/CER evaluation scripts
+    Transcriptions from the base Whisper-Medium model were compared directly against the fine-tuned model to measure relative performance gain.
 ## Inference Usage