Update README.md
Browse files
README.md
CHANGED
|
@@ -132,48 +132,51 @@ This model was evaluated using industry-standard transcription metrics focused o
|
|
| 132 |
|
| 133 |
1. **Word Error Rate (WER)**
|
| 134 |
|
| 135 |
-
**Performance Summary**
|
| 136 |
-
|
| 137 |
-
|
| 138 |
-
|
| 139 |
-
|
| 140 |
-
(Gujarati, Marathi, Telugu)
|
| 141 |
-
Noisy Telephony Audio
|
| 142 |
-
|
| 143 |
-
|
|
|
|
| 144 |
|
| 145 |
2. **Character Error Rate (CER)**
|
| 146 |
|
| 147 |
-
**Performance Summary**
|
| 148 |
-
**Dataset Split**
|
| 149 |
-
|
| 150 |
-
|
| 151 |
-
|
| 152 |
-
|
| 153 |
-
|
|
|
|
|
|
|
| 154 |
|
| 155 |
3. **Qualitative Improvements**
|
| 156 |
|
| 157 |
-
**Accent Handling**
|
| 158 |
-
- Better recognition of Hindi, Gujarati, Marathi, Telugu, Tamil, and Bengali accents
|
| 159 |
-
- More stable decoding of Indian English pronunciation patterns
|
| 160 |
-
- Reduced errors with long or complex words
|
| 161 |
-
**Code-Switching Performance**
|
| 162 |
-
- Significant improvement in **Hinglish** transcription accuracy
|
| 163 |
-
- Handles fast switching between languages with fewer substitutions
|
| 164 |
-
**Noise Robustness**
|
| 165 |
-
- Improved performance with low-bitrate telephony audio
|
| 166 |
-
- Fewer hallucinations during background noise or overlaps
|
| 167 |
-
- Better segmentation and continuity in long conversations
|
| 168 |
|
| 169 |
4. **Evaluation Methodology**
|
| 170 |
|
| 171 |
-
Evaluation was performed using
|
| 172 |
-
- The Svarah dataset test split
|
| 173 |
-
- Additional manually curated Hinglish test samples
|
| 174 |
-
- Noisy, real-world telephony recordings
|
| 175 |
-
- Standard Hugging Face WER/CER evaluation scripts
|
| 176 |
-
Transcriptions from the base Whisper-Medium model were compared directly against the fine-tuned model to measure relative performance gain.
|
| 177 |
|
| 178 |
## Inference Usage
|
| 179 |
|
|
|
|
| 132 |
|
| 133 |
1. **Word Error Rate (WER)**
|
| 134 |
|
| 135 |
+
**Performance Summary**
|
| 136 |
+
|**Dataset Split** | **WER (Whisper-Medium Base)** | **WER (Fine-Tuned Model)** |
|
| 137 |
+
| -------------------------------------------- | ------------------------- | ---------------------- |
|
| 138 |
+
| Svarah Test Set | 21.3% | 13.4% |
|
| 139 |
+
| Hinglish Subset | 26.8% | 15.9% |
|
| 140 |
+
| Regional English (Gujarati, Marathi, Telugu) | 27.4% | 17.2% |
|
| 141 |
+
| Noisy Telephony Audio | 24.6% | 14.1% |
|
| 142 |
+
|
| 143 |
+
**Improvement:**
|
| 144 |
+
The fine-tuned model shows an average **34–42% relative WER reduction**, especially on code-switched (Hinglish) and noisy contact center audio.
|
| 145 |
|
| 146 |
2. **Character Error Rate (CER)**
|
| 147 |
|
| 148 |
+
**Performance Summary**
|
| 149 |
+
| **Dataset Split** | **CER (Base)** | **CER (Fine-Tuned)** |
|
| 150 |
+
| --------------------- | ---------- | ---------------- |
|
| 151 |
+
| Svarah Test Set | 13.7% | 8.2% |
|
| 152 |
+
| Hinglish Subset | 17.4% | 9.9% |
|
| 153 |
+
| Regional Accents | 18.2% | 11.1% |
|
| 154 |
+
| Noisy Telephony Audio | 16.1% | 9.0% |
|
| 155 |
+
|
| 156 |
+
The improvements in CER confirm that the fine-tuned model handles accented pronunciation, speech rate variation, and irregular spacing more effectively.
|
| 157 |
|
| 158 |
3. **Qualitative Improvements**
|
| 159 |
|
| 160 |
+
**Accent Handling**
|
| 161 |
+
- Better recognition of Hindi, Gujarati, Marathi, Telugu, Tamil, and Bengali accents
|
| 162 |
+
- More stable decoding of Indian English pronunciation patterns
|
| 163 |
+
- Reduced errors with long or complex words
|
| 164 |
+
**Code-Switching Performance**
|
| 165 |
+
- Significant improvement in **Hinglish** transcription accuracy
|
| 166 |
+
- Handles fast switching between languages with fewer substitutions
|
| 167 |
+
**Noise Robustness**
|
| 168 |
+
- Improved performance with low-bitrate telephony audio
|
| 169 |
+
- Fewer hallucinations during background noise or overlaps
|
| 170 |
+
- Better segmentation and continuity in long conversations
|
| 171 |
|
| 172 |
4. **Evaluation Methodology**
|
| 173 |
|
| 174 |
+
**Evaluation was performed using**:
|
| 175 |
+
- The Svarah dataset test split
|
| 176 |
+
- Additional manually curated Hinglish test samples
|
| 177 |
+
- Noisy, real-world telephony recordings
|
| 178 |
+
- Standard Hugging Face WER/CER evaluation scripts
|
| 179 |
+
Transcriptions from the base Whisper-Medium model were compared directly against the fine-tuned model to measure relative performance gain.
|
| 180 |
|
| 181 |
## Inference Usage
|
| 182 |
|