Update README.md
Browse files
README.md
CHANGED
|
@@ -1,5 +1,4 @@
|
|
| 1 |
---
|
| 2 |
-
license: mit
|
| 3 |
datasets:
|
| 4 |
- rsalshalan/QASR
|
| 5 |
- DynamicSuperb/DialectIdentification_ADI17
|
|
@@ -20,8 +19,8 @@ pipeline_tag: audio-text-to-text
|
|
| 20 |
**TinyOctopus** is a **Bilingual Audio Language Model (Audio-LLM)** designed to process and generate text from audio inputs. The model leverages **Distil-Whisper (distil-large-v3)** for audio encoding, a **cross-attention projection layer** for alignment, and **DeepSeek 1.5B** for text generation. TinyOctopus is optimized for tasks such as:
|
| 21 |
|
| 22 |
- **Bilingual Automatic Speech Recognition (ASR)** 🗣️
|
| 23 |
-
- **Speech Translation** 🌍
|
| 24 |
-
- **Dialect Identification**
|
| 25 |
|
| 26 |
TinyOctopus maintaining the architectural principles of the following structure:
|
| 27 |
|
|
@@ -56,16 +55,25 @@ print("Generated Text:", output)
|
|
| 56 |
|
| 57 |
## Evaluation Results
|
| 58 |
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
| **
|
| 63 |
-
| **
|
| 64 |
-
| **Dialect Identification (Accuracy)** | 70.59 |
|
| 65 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 66 |
|
| 67 |
-
|
|
|
|
|
|
|
|
|
|
| 68 |
|
| 69 |
-
## License
|
| 70 |
|
| 71 |
-
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
datasets:
|
| 3 |
- rsalshalan/QASR
|
| 4 |
- DynamicSuperb/DialectIdentification_ADI17
|
|
|
|
| 19 |
**TinyOctopus** is a **Bilingual Audio Language Model (Audio-LLM)** designed to process and generate text from audio inputs. The model leverages **Distil-Whisper (distil-large-v3)** for audio encoding, a **cross-attention projection layer** for alignment, and **DeepSeek 1.5B** for text generation. TinyOctopus is optimized for tasks such as:
|
| 20 |
|
| 21 |
- **Bilingual Automatic Speech Recognition (ASR)** 🗣️
|
| 22 |
+
- **Arabic to English Speech Translation** 🌍
|
| 23 |
+
- **Spoken Arabic Dialect Identification**
|
| 24 |
|
| 25 |
TinyOctopus maintaining the architectural principles of the following structure:
|
| 26 |
|
|
|
|
| 55 |
|
| 56 |
## Evaluation Results
|
| 57 |
|
| 58 |
+
## ASR Performance (WER & Error Breakdown)
|
| 59 |
+
| **Tasks** | **WER (%)** | **Substitution (%)** | **Deletion (%)** | **Insertion (%)** |
|
| 60 |
+
|--------------------------------------|------------|----------------------|------------------|------------------|
|
| 61 |
+
| **ASR_QASR (Arabic)** | 16.00 | 9.5 | 2.7 | 3.8 |
|
| 62 |
+
| **ASR_ibrispeech&tedlium (English)** | 4.50 | 3.0 | 0.8 | 0.7 |
|
|
|
|
| 63 |
|
| 64 |
+
---
|
| 65 |
+
|
| 66 |
+
## Translation Performance (BLEU Scores)
|
| 67 |
+
| **Tasks** | **BLEU (GPT-4o)** | **BLEU (Google)** |
|
| 68 |
+
|----------------|------------------|------------------|
|
| 69 |
+
| **Translation** | 55.05 | 43.23 |
|
| 70 |
+
|
| 71 |
+
---
|
| 72 |
|
| 73 |
+
## Dialect Identification Accuracy
|
| 74 |
+
| **Tasks** | **Accuracy (%)** |
|
| 75 |
+
|----------------------------|------------------|
|
| 76 |
+
| **Dialect Identification** | 70.59 |
|
| 77 |
|
|
|
|
| 78 |
|
| 79 |
+

|