EYEDOL
/

SALAMA_SM_ASR

@@ -23,7 +23,6 @@ pipeline_tag: automatic-speech-recognition
 **Base Model:** `openai/whisper-small` (fine-tuned for Swahili)
 ---
 ## 🌍 Overview
 **SALAMA-STT** (Speech-to-Text) is the **first module** of the **SALAMA Framework** — a modular end-to-end **speech-to-speech AI system** built for African languages.
@@ -50,7 +49,6 @@ The model was fine-tuned on the **Mozilla Common Voice 17.0 Swahili** dataset, e
 | Languages | Swahili (`sw`), English (`en`) |
 ---
 ## 📚 Dataset
 | Dataset | Description | Purpose |
@@ -60,7 +58,6 @@ The model was fine-tuned on the **Mozilla Common Voice 17.0 Swahili** dataset, e
 | Common Voice validation split | 2.3 hours | Evaluation |
 ---
 ## 🧠 Model Capabilities
 - Speech-to-text transcription in **Swahili**
@@ -70,7 +67,6 @@ The model was fine-tuned on the **Mozilla Common Voice 17.0 Swahili** dataset, e
 - Provides timestamped segment transcriptions
 ---
 ## 📊 Evaluation Metrics
 | Metric | Baseline (Whisper-small) | Fine-tuned (SALAMA-STT) | Improvement |
@@ -82,7 +78,6 @@ The model was fine-tuned on the **Mozilla Common Voice 17.0 Swahili** dataset, e
 > Evaluation conducted on a 2-hour held-out Swahili validation set from Common Voice.
 ---
 ## ⚙️ Usage (Python Example)
 Below is a quick example for Swahili speech transcription using this model:
@@ -113,7 +108,6 @@ print(result["text"])
 > *“Karibu kwenye mfumo wa SALAMA unaosaidia kutambua na kuelewa sauti ya Kiswahili kwa usahihi mkubwa.”*
 ---
 ## 🔍 Model Performance Summary
 | Dataset | Metric | Score |
@@ -123,7 +117,6 @@ print(result["text"])
 | Local Swahili Test Set | Accuracy | **95.4%** |
 ---
 ## ⚡ Key Features
 - 🎙️ **Accurate Swahili ASR** trained on diverse voices
@@ -133,7 +126,6 @@ print(result["text"])
 - 🚀 **Fast inference optimized with FP16 precision**
 ---
 ## 🚫 Limitations
 - May misinterpret **code-mixed (Swahili-English)** speech
@@ -142,11 +134,9 @@ print(result["text"])
 - Performance may decline on **non-native Swahili speakers**
 ---
 ## 🔗 Related Models
 | Model | Description |
 |--------|-------------|
 | [`EYEDOL/salama-llm`](https://huggingface.co/EYEDOL/salama-llm) | Swahili instruction-tuned LLM for reasoning and dialogue |
 | [`EYEDOL/salama-tts`](https://huggingface.co/EYEDOL/salama-tts) | Swahili text-to-speech (VITS) model for natural speech synthesis |

 **Base Model:** `openai/whisper-small` (fine-tuned for Swahili)
 ---
 ## 🌍 Overview
 **SALAMA-STT** (Speech-to-Text) is the **first module** of the **SALAMA Framework** — a modular end-to-end **speech-to-speech AI system** built for African languages.
 | Languages | Swahili (`sw`), English (`en`) |
 ---
 ## 📚 Dataset
 | Dataset | Description | Purpose |
 | Common Voice validation split | 2.3 hours | Evaluation |
 ---
 ## 🧠 Model Capabilities
 - Speech-to-text transcription in **Swahili**
 - Provides timestamped segment transcriptions
 ---
 ## 📊 Evaluation Metrics
 | Metric | Baseline (Whisper-small) | Fine-tuned (SALAMA-STT) | Improvement |
 > Evaluation conducted on a 2-hour held-out Swahili validation set from Common Voice.
 ---
 ## ⚙️ Usage (Python Example)
 Below is a quick example for Swahili speech transcription using this model:
 > *“Karibu kwenye mfumo wa SALAMA unaosaidia kutambua na kuelewa sauti ya Kiswahili kwa usahihi mkubwa.”*
 ---
 ## 🔍 Model Performance Summary
 | Dataset | Metric | Score |
 | Local Swahili Test Set | Accuracy | **95.4%** |
 ---
 ## ⚡ Key Features
 - 🎙️ **Accurate Swahili ASR** trained on diverse voices
 - 🚀 **Fast inference optimized with FP16 precision**
 ---
 ## 🚫 Limitations
 - May misinterpret **code-mixed (Swahili-English)** speech
 - Performance may decline on **non-native Swahili speakers**
 ---
 ## 🔗 Related Models
 | Model | Description |
 |--------|-------------|
 | [`EYEDOL/salama-llm`](https://huggingface.co/EYEDOL/salama-llm) | Swahili instruction-tuned LLM for reasoning and dialogue |
 | [`EYEDOL/salama-tts`](https://huggingface.co/EYEDOL/salama-tts) | Swahili text-to-speech (VITS) model for natural speech synthesis |