ArabicSpeech
/

Octopus

Audio-Text-to-Text

Arabic

English

Model card Files Files and versions

xet

Community

SaraAlthubaiti commited on Nov 8, 2025

Commit

b48d4e2

verified ·

1 Parent(s): dcdec03

Update README.md

Browse files

Files changed (1) hide show

README.md +54 -0

README.md CHANGED Viewed

@@ -88,8 +88,62 @@ output = transcribe(audio_path, task="asr")  # Options: "dialect", "asr", "trans
 print("Generated Text:", output)
 ```
 ---
 ## Examples
 ### Example 1: Arabic Speech Recognition

 print("Generated Text:", output)
 ```
+## 🧪 Evaluation Results
+### 🎙️ ASR Performance (WER ↓)
+| **Dataset** | **Ar-Octopus** | **Bilingual-Octopus** | **Trans-Octopus** | **Whisper-large-v3** | **SeamlessM4T** |
+|:-------------|:---------------:|:---------------------:|:-----------------:|:--------------------:|:----------------:|
+| **MGB2 (Arabic)** | 16.5 \| 6.5 | 15.2 \| 6.8 | **13.3 \| 5.9** | 16.2 \| 7.9 | 17.2 \| 8.4 |
+| **test-clean (English)** | 82.5 \| 92.4 | **2.6 \| 1.4** | 67.3 \| 79.4 | 2.86 \| 0.98 | 2.68 \| 0.88 |
+| **test-other (English)** | 86.9 \| 95.1 | **5.1 \| 3.4** | 71.5 \| 87.8 | 5.00 \| 2.05 | **5.07 \| 1.94** |
+| **tedlium (English)** | 101.9 \| 77.4 | **5.1 \| 3.9** | 85.2 \| 63.6 | 11.9 \| 4.4 | 86.5 \| 62.2 |
+| **Escwa (Code-Switched)** | 42.5 \| 26.3 | **40.8 \| 27.1** | 41.8 \| 25.1 | 47.3 \| 31.0 | 52.0 \| 35.3 |
+| **Mixat-ALL (Code-Switched)** | 22.0 \| 9.0 | **23.4 \| 10.3** | 34.1 \| 10.6 | 29.0 \| 15.0 | 32.8 \| 16.9 |
+| **Mixat-CS (Code-Switched)** | 26.4 \| 12.4 | **28.5 \| 14.9** | 27.8 \| 13.3 | 34.8 \| 20.6 | 38.2 \| 21.8 |
+| **In-house Long-form** | 25.4 \| 13.0 | 24.9 \| 12.5 | **24.1 \| 12.1** | 26.7 \| 15.2 | 29.3 \| 18.6 |
+> **+86 % English improvement** observed with the addition of language-tokens for bilingual and translation variants.
+---
+### 🪶 Tiny-Octopus & Fine-Tuning (WER ↓)
+| **Dataset** | **TinyOctopus LLaMA-3 1B** | **Fine-tuned LLaMA-3 1B** | **TinyOctopus DeepSeek 1.5B** | **Fine-tuned DeepSeek 1.5B** |
+|:-------------|:-------------------------:|:-------------------------:|:-----------------------------:|:-----------------------------:|
+| **MGB2 (Arabic)** | 22.6 \| 15.7 | 16.1 \| **9.5** | 23.2 \| 15.8 | **15.5 \| 9.2** |
+| **test-clean (English)** | 7.5 \| 5.7 | **3.1 \| 1.3** | 7.7 \| 5.8 | 7.6 \| 5.7 |
+| **test-other (English)** | 11.3 \| 8.0 | **6.9 \| 3.5** | 11.5 \| 8.2 | 11.3 \| 8.0 |
+| **Escwa (Code-Switched)** | 42.5 \| 26.9 | **40.3 \| 24.4** | 43.6 \| 27.8 | 41.8 \| 26.3 |
+| **Mixat-All** | 35.2 \| 19.6 | **34.1 \| 19.3** | 37.1 \| 21.1 | 35.5 \| 19.9 |
+| **Mixat-CS** | 40.2 \| 24.2 | **36.2 \| 21.4** | 41.2 \| 25.2 | 39.9 \| 24.2 |
+| **In-house Long-files** | 44.3 \| 29.1 | **42.8 \| 26.9** | 47.0 \| 32.7 | 43.7 \| 31.5 |
+> **Code-Switch TTS** augmentation yielded **≈ 20 % WER reduction** across multilingual evaluation sets.
 ---
+### 🌍 Translation Performance (BLEU ↑ / BERT-F1 ↑)
+| **Model / System** | **CoVoST2 (Ar→En)** | **FLEURS (Ar→En)** |
+|:--------------------|:------------------:|:-----------------:|
+| Whisper-large-v3 | 28.8 / 0.53 | 15.1 / 0.47 |
+| SeamlessM4T | 33.7 / 0.55 | **23.9 / 0.56** |
+| **Trans-Octopus** | **38.6 / 0.64** | **23.2 / 0.58** |
+| TO-LLaMA-1B | 33.9 / 0.61 | 20.5 / 0.53 |
+| TO-DeepSeek-1.5B | 33.6 / 0.61 | 20.8 / 0.53 |
+> **Trans-Octopus** achieves the best BLEU and BERT-F1 on **CoVoST2** and competitive results on **FLEURS**, surpassing SeamlessM4T in low-resource conditions.
+---
+### 🏷️ Dialect Identification
+For **dialect identification**, the **Tiny-Octopus** models achieved **87 – 89 % accuracy** across all 17 dialects in **ADI-17**.
+The confusion matrices reveal clear separation among **Gulf**, **Levantine**, **North-African**, and **Egyptian** clusters — showing that even compact models can internalize subtle dialectal cues when trained in a multitask setting.
 ## Examples
 ### Example 1: Arabic Speech Recognition