Beijuka commited on
Commit
6c47cc7
·
verified ·
1 Parent(s): f664c2f

Update src/streamlit_app.py

Browse files
Files changed (1) hide show
  1. src/streamlit_app.py +33 -7
src/streamlit_app.py CHANGED
@@ -220,23 +220,49 @@ with tab5:
220
 
221
  with tab6:
222
  st.header("Results: WER vs Dataset Size")
223
-
224
  st.write("""
225
- Overall, the WER decreases as the number of training hours increases across all models and languages. This trend underscores the importance of dataset size in improving ASR performance. However, the rate of improvement varies significantly between models, with some benefiting more from additional data than others.
226
- """)
 
 
227
 
228
  # XLS-R
229
  st.subheader("XLS-R")
230
- st.image("Images/xlsrlog.png", caption="Log WER vs Training Hours for XLS-R")
 
 
 
 
231
 
232
  # W2v-BERT
233
  st.subheader("W2v-BERT")
234
- st.image("Images/bertlog.png", caption="Log WER vs Training Hours for W2v-BERT")
 
 
 
 
235
 
236
  # Whisper
237
  st.subheader("Whisper")
238
- st.image("Images/whisperlog.png", caption="Log WER vs Training Hours for Whisper")
 
 
 
 
239
 
240
  # MMS
241
  st.subheader("MMS")
242
- st.image("Images/mmslog.png", caption="Log WER vs Training Hours for MMS")
 
 
 
 
 
 
 
 
 
 
 
 
 
220
 
221
  with tab6:
222
  st.header("Results: WER vs Dataset Size")
223
+
224
  st.write("""
225
+ Overall, the Word Error Rate (WER) decreases as the number of training hours increases across all models and languages.
226
+ This highlights the importance of dataset size in improving ASR performance, although the rate of improvement varies
227
+ significantly between models.
228
+ """)
229
 
230
  # XLS-R
231
  st.subheader("XLS-R")
232
+ st.write("""
233
+ XLS-R shows a steep decline in log WER as the dataset size increases, especially in low-to-moderate data regimes.
234
+ The improvement slows as the dataset becomes larger, suggesting diminishing returns in high-data settings.
235
+ """)
236
+ st.image("src/Images/xlsrlog.png", caption="Log WER vs Training Hours for XLS-R")
237
 
238
  # W2v-BERT
239
  st.subheader("W2v-BERT")
240
+ st.write("""
241
+ W2v-BERT exhibits a more gradual decline in log WER. It performs well in low-data settings, showing stable reduction
242
+ in WER as dataset size increases. This makes it suitable for low-resource languages.
243
+ """)
244
+ st.image("src/Images/bertlog.png", caption="Log WER vs Training Hours for W2v-BERT")
245
 
246
  # Whisper
247
  st.subheader("Whisper")
248
+ st.write("""
249
+ Whisper shows a consistent but moderate decline in log WER. Improvements are more linear compared to XLS-R, benefiting
250
+ steadily from additional data, but it does not reach XLS-R’s high-data performance.
251
+ """)
252
+ st.image("src/Images/whisperlog.png", caption="Log WER vs Training Hours for Whisper")
253
 
254
  # MMS
255
  st.subheader("MMS")
256
+ st.write("""
257
+ MMS shows significant improvement between 1–5 hours of training across multiple languages. However, the rate of
258
+ improvement declines as more data is added. MMS performs strongly in both low- and high-data settings.
259
+ """)
260
+ st.image("src/Images/mmslog.png", caption="Log WER vs Training Hours for MMS")
261
+
262
+ # Overall Insight
263
+ st.subheader("Overall Insights")
264
+ st.write("""
265
+ - All models exhibit the largest WER improvements when training data is scarce.
266
+ - Beyond a certain dataset size, adding more data results in marginal gains.
267
+ - Dataset size remains a critical factor, but its impact plateaus once the model is trained on sufficient data.
268
+ """)