shashikant-stl commited on
Commit
cd42541
·
verified ·
1 Parent(s): 06dd62b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +36 -33
README.md CHANGED
@@ -132,48 +132,51 @@ This model was evaluated using industry-standard transcription metrics focused o
132
 
133
  1. **Word Error Rate (WER)**
134
 
135
- **Performance Summary**
136
- **Dataset Split** **WER(Whisper-Medium Base)** **WER (Fine-Tuned Model)**
137
- Svarah Test Set 21.3% 13.4%
138
- Hinglish Subset 26.8% 15.9%
139
- Regional English
140
- (Gujarati, Marathi, Telugu) 27.4% 17.2%
141
- Noisy Telephony Audio 24.6% 14.1%
142
- **Improvement:**
143
- The fine-tuned model shows an average **34–42% relative WER reduction**, especially on code-switched (Hinglish) and noisy contact center audio.
 
144
 
145
  2. **Character Error Rate (CER)**
146
 
147
- **Performance Summary**
148
- **Dataset Split** **CER (Base)** **CER (Fine-Tuned)**
149
- Svarah Test Set 13.7% 8.2%
150
- Hinglish Subset 17.4% 9.9%
151
- Regional Accents 18.2% 11.1%
152
- Noisy Telephony Audio 16.1% 9.0%
153
- The improvements in CER confirm that the fine-tuned model handles accented pronunciation, speech rate variation, and irregular spacing more effectively.
 
 
154
 
155
  3. **Qualitative Improvements**
156
 
157
- **Accent Handling**
158
- - Better recognition of Hindi, Gujarati, Marathi, Telugu, Tamil, and Bengali accents
159
- - More stable decoding of Indian English pronunciation patterns
160
- - Reduced errors with long or complex words
161
- **Code-Switching Performance**
162
- - Significant improvement in **Hinglish** transcription accuracy
163
- - Handles fast switching between languages with fewer substitutions
164
- **Noise Robustness**
165
- - Improved performance with low-bitrate telephony audio
166
- - Fewer hallucinations during background noise or overlaps
167
- - Better segmentation and continuity in long conversations
168
 
169
  4. **Evaluation Methodology**
170
 
171
- Evaluation was performed using:
172
- - The Svarah dataset test split
173
- - Additional manually curated Hinglish test samples
174
- - Noisy, real-world telephony recordings
175
- - Standard Hugging Face WER/CER evaluation scripts
176
- Transcriptions from the base Whisper-Medium model were compared directly against the fine-tuned model to measure relative performance gain.
177
 
178
  ## Inference Usage
179
 
 
132
 
133
  1. **Word Error Rate (WER)**
134
 
135
+ **Performance Summary**
136
+ |**Dataset Split** | **WER (Whisper-Medium Base)** | **WER (Fine-Tuned Model)** |
137
+ | -------------------------------------------- | ------------------------- | ---------------------- |
138
+ | Svarah Test Set | 21.3% | 13.4% |
139
+ | Hinglish Subset | 26.8% | 15.9% |
140
+ | Regional English (Gujarati, Marathi, Telugu) | 27.4% | 17.2% |
141
+ | Noisy Telephony Audio | 24.6% | 14.1% |
142
+
143
+ **Improvement:**
144
+ The fine-tuned model shows an average **34–42% relative WER reduction**, especially on code-switched (Hinglish) and noisy contact center audio.
145
 
146
  2. **Character Error Rate (CER)**
147
 
148
+ **Performance Summary**
149
+ | **Dataset Split** | **CER (Base)** | **CER (Fine-Tuned)** |
150
+ | --------------------- | ---------- | ---------------- |
151
+ | Svarah Test Set | 13.7% | 8.2% |
152
+ | Hinglish Subset | 17.4% | 9.9% |
153
+ | Regional Accents | 18.2% | 11.1% |
154
+ | Noisy Telephony Audio | 16.1% | 9.0% |
155
+
156
+ The improvements in CER confirm that the fine-tuned model handles accented pronunciation, speech rate variation, and irregular spacing more effectively.
157
 
158
  3. **Qualitative Improvements**
159
 
160
+ **Accent Handling**
161
+ - Better recognition of Hindi, Gujarati, Marathi, Telugu, Tamil, and Bengali accents
162
+ - More stable decoding of Indian English pronunciation patterns
163
+ - Reduced errors with long or complex words
164
+ **Code-Switching Performance**
165
+ - Significant improvement in **Hinglish** transcription accuracy
166
+ - Handles fast switching between languages with fewer substitutions
167
+ **Noise Robustness**
168
+ - Improved performance with low-bitrate telephony audio
169
+ - Fewer hallucinations during background noise or overlaps
170
+ - Better segmentation and continuity in long conversations
171
 
172
  4. **Evaluation Methodology**
173
 
174
+ **Evaluation was performed using**:
175
+ - The Svarah dataset test split
176
+ - Additional manually curated Hinglish test samples
177
+ - Noisy, real-world telephony recordings
178
+ - Standard Hugging Face WER/CER evaluation scripts
179
+ Transcriptions from the base Whisper-Medium model were compared directly against the fine-tuned model to measure relative performance gain.
180
 
181
  ## Inference Usage
182