Spaces:

asr-africa
/

Automatic_Speech_Recognition_for_African_Languages

Sleeping

App Files Files Community

Beijuka commited on Sep 27

Commit

ba4ffeb

verified ·

1 Parent(s): a1923a3

Update src/streamlit_app.py

Browse files

Files changed (1) hide show

src/streamlit_app.py +62 -0

src/streamlit_app.py CHANGED Viewed

@@ -84,3 +84,65 @@ with tab3:
                 "Models fine-tuned from Wav2Vec2 XLS-R, Whisper, MMS-1B, and W2V2-BERT "
                 "to support high-quality speech recognition in this language."
             )

                 "Models fine-tuned from Wav2Vec2 XLS-R, Whisper, MMS-1B, and W2V2-BERT "
                 "to support high-quality speech recognition in this language."
             )
+# --- Tab 4: Evaluation Scenarios ---
+with tab4:
+    st.header("Evaluation Scenarios")
+    st.write(
+        "To benchmark ASR models for African languages, we design evaluation scenarios "
+        "that mimic real-world challenges such as limited training data, domain shift, "
+        "and variation in speech style."
+    )
+    # Summary Table
+    st.subheader("Scenario Overview")
+    scenarios = pd.DataFrame([
+        {
+            "Scenario": "Data Efficiency Benchmark",
+            "Focus": "Low-resource training (1 hour per language)",
+            "Languages": "Multiple African languages",
+            "Dataset": "[ASR Africa Data Efficiency Benchmark](https://huggingface.co/datasets/asr-africa/ASRAfricaDataEfficiencyBenchmark)"
+        },
+        {
+            "Scenario": "Domain Adaptation Benchmark",
+            "Focus": "Performance shift across domains",
+            "Languages": "Akan (General → Finance), Wolof (General → Agriculture)",
+            "Dataset": "[African ASR Domain Adaptation Benchmark](https://huggingface.co/datasets/asr-africa/African-ASR-Domain-Adaptation-Evaluation)"
+        },
+        {
+            "Scenario": "Speech Type Adaptation",
+            "Focus": "Different speech types (read, conversation, etc.)",
+            "Languages": "Ongoing (various African languages)",
+            "Dataset": "[African ASR Speech Type Adaptation](https://huggingface.co/datasets/asr-africa/African-ASR-Speech-Type-Adaptation)"
+        }
+    ])
+    st.dataframe(scenarios, use_container_width=True)
+    st.subheader("Explore Scenarios")
+    with st.expander("Data Efficiency Benchmark"):
+        st.markdown("""
+        - **Goal:** Evaluate ASR performance in low-resource conditions.
+        - **Design:** 1 hour of transcribed audio per language.
+        - **Includes:** MP3 audio + metadata (speaker age, gender, environment).
+        - **Use case:** Encourage data-efficient ASR systems.
+        🔗 [View dataset](https://huggingface.co/datasets/asr-africa/ASRAfricaDataEfficiencyBenchmark)
+        """)
+    with st.expander("Domain Adaptation Benchmark"):
+        st.markdown("""
+        - **Goal:** Test ASR generalization across domains.
+        - **Languages:**
+          - Akan → General purpose training, Financial domain testing.
+          - Wolof → General purpose training, Agricultural domain testing.
+        - **Challenge:** Many ASR systems degrade when moved to new domains.
+        🔗 [View dataset](https://huggingface.co/datasets/asr-africa/African-ASR-Domain-Adaptation-Evaluation)
+        """)
+    with st.expander("Speech Type Adaptation"):
+        st.markdown("""
+        - **Goal:** Measure ASR performance on different speech styles.
+        - **Types of Speech:** Read speech, conversational, spontaneous speech.
+        - **Work in progress** – expanding to multiple African languages.
+        🔗 [View dataset](https://huggingface.co/datasets/asr-africa/African-ASR-Speech-Type-Adaptation)
+        """)