bc7ec356 commited on
Commit
77b8677
·
verified ·
1 Parent(s): d9602b0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -2
README.md CHANGED
@@ -166,9 +166,25 @@ Output: Curated dataset D*
166
  - Error-aware adaptive sample selection across training rounds
167
  - Significant reduction in computational resources and training time
168
 
169
- ## Performance Benchmarks
170
 
171
- ### Indic Language Results
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
172
 
173
  Word error rates (%) on Indic benchmark datasets:
174
 
@@ -197,6 +213,22 @@ Comparison of publicly-available models on the Hindi subset of the benchmark:
197
  | IndicWhisper | 10.3 | 12 | 15 | 11.4 | 7.6 | – | 26.8 | 13.8 |
198
  | **HEEP Indic** | **8.53** | **8.97** | **9.96** | **11.04** | **6.59** | **12.05** | **25.98** | **11.9** |
199
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
200
  ## Model Details
201
 
202
  - **Architecture**: Qwen3ASR — Transformer-based encoder-decoder optimized for multilingual transcription
 
166
  - Error-aware adaptive sample selection across training rounds
167
  - Significant reduction in computational resources and training time
168
 
 
169
 
170
+ ## Post-Rebuttal Update: Cross-Architecture Validation with HEEP-Indic
171
+
172
+ **Addressing Q1 (Gain Attribution), Q2 (Baselines), and Q3 (Base Model Dependency)**
173
+
174
+ We apologize for the supplementary post after the rebuttal period. These results were finalized shortly after the deadline, and we wanted to ensure complete experimental evidence was available rather than leave placeholders.
175
+
176
+
177
+ #### Resources
178
+
179
+ * **Reproducibility (Universal Model):** [https://huggingface.co/bc7ec356/heep-universal](https://huggingface.co/bc7ec356/heep-universal)
180
+ * **Cross-Architecture Model (Indic):** [https://huggingface.co/bc7ec356/heep-indic](https://huggingface.co/bc7ec356/heep-indic)
181
+
182
+
183
+ ### Cross-Architecture Generalization
184
+
185
+ To directly address concerns about generalization beyond Whisper V3 Turbo, we trained **Qwen3-ASR (1.7B)**, an architecturally distinct audio-language model, on HEEP-curated data spanning **46 Indian languages** (~4.78M utterances). The curation pipeline is identical to the one described in the paper with no architecture-specific tuning.
186
+
187
+ ### Hindi Benchmark Comparison (7 Benchmarks)
188
 
189
  Word error rates (%) on Indic benchmark datasets:
190
 
 
213
  | IndicWhisper | 10.3 | 12 | 15 | 11.4 | 7.6 | – | 26.8 | 13.8 |
214
  | **HEEP Indic** | **8.53** | **8.97** | **9.96** | **11.04** | **6.59** | **12.05** | **25.98** | **11.9** |
215
 
216
+
217
+ **HEEP-Indic achieves 11.9% average Hindi WER vs. 13.8% for IndicWhisper (14% relative improvement).**
218
+
219
+ ### Key Takeaways
220
+
221
+ 1. **Cross-architecture generalization confirmed.** The same HEEP pipeline improves two distinct backbones: Whisper V3 Turbo (0.8B, encoder-decoder) and Qwen3-ASR (1.7B, audio-language model), without modification.
222
+
223
+ 2. **Controlled multilingual evaluation.** Results span 16 languages across Indo-Aryan, Dravidian, and Classical families on standardized benchmarks with consistent evaluation protocols.
224
+
225
+ 3. **Model-independent scoring.** Entropy scoring operates on MFCCs, G2P phonemes, and token distributions, not model internals. The same curated dataset was used for both backbones.
226
+
227
+ 4. **Reproducibility.** Model weights, curation code, and training scripts for both backbones are at the anonymous repository.
228
+
229
+ *We hope Reviewers 2ezj, oXjG, and S4Jd also find this supplementary evidence relevant to their earlier questions on generalization and controlled multilingual evaluation.*
230
+
231
+
232
  ## Model Details
233
 
234
  - **Architecture**: Qwen3ASR — Transformer-based encoder-decoder optimized for multilingual transcription