|
|
--- |
|
|
base_model: unsloth/llama-3.2-3b-instruct |
|
|
tags: |
|
|
- text-generation-inference |
|
|
- transformers |
|
|
- unsloth |
|
|
- llama |
|
|
- trl |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
- sw |
|
|
datasets: |
|
|
- saillab/alpaca_swahili_taco |
|
|
metrics: |
|
|
- bleu |
|
|
- accuracy |
|
|
- cer |
|
|
- rouge |
|
|
pipeline_tag: text-generation |
|
|
--- |
|
|
|
|
|
# π§ SALAMA LLM β Swahili Instruction-Tuned Text Generation Model |
|
|
|
|
|
**π¨βπ» Developer:** AI4NNOV |
|
|
**βοΈ Authors:** AI4NNOV |
|
|
**π¦ Version:** v1.0 |
|
|
**π License:** Apache 2.0 |
|
|
**π οΈ Model Type:** Instruction-Tuned Large Language Model |
|
|
**π§© Base Model:** `Jacaranda/UlizaLlama` |
|
|
|
|
|
--- |
|
|
|
|
|
## π Overview |
|
|
|
|
|
**SALAMA LLM** is the **language understanding and generation engine** of the **SALAMA Framework** β a modular Speech-to-Speech (STS) AI pipeline built for African languages. |
|
|
The model is fine-tuned on Swahili instruction datasets to enable natural, culturally relevant responses in text generation, summarization, question answering, and translation. |
|
|
|
|
|
This model represents a major step in bridging the linguistic digital divide by providing **high-quality Swahili AI text generation** capabilities within an open, scalable framework. |
|
|
|
|
|
--- |
|
|
|
|
|
## π§±οΈ Model Architecture |
|
|
|
|
|
SALAMA LLM is based on **Jacaranda/UlizaLlama**, fine-tuned using **Parameter-Efficient Fine-Tuning (PEFT)** via **LoRA/QLoRA**. |
|
|
The architecture supports mixed Swahili-English text inputs while focusing on fluent Swahili text generation for both casual and formal domains. |
|
|
|
|
|
| Parameter | Value | |
|
|
|------------|--------| |
|
|
| **Base Model** | `Jacaranda/UlizaLlama` | |
|
|
| **Fine-Tuning** | QLoRA / LoRA (PEFT) | |
|
|
| **Precision** | 4-bit quantization | |
|
|
| **Optimizer** | AdamW | |
|
|
| **Learning Rate** | 2e-5 | |
|
|
| **Epochs** | 3β5 | |
|
|
| **Frameworks** | Transformers, TRL, PEFT, Unsloth | |
|
|
| **Languages** | Swahili (sw), English (en) | |
|
|
|
|
|
--- |
|
|
|
|
|
## π Datasets |
|
|
|
|
|
| Dataset | Description | Purpose | |
|
|
|----------|--------------|----------| |
|
|
| `saillab/alpaca_swahili_taco` | Swahili Alpaca-style instruction-response dataset | Instruction tuning | |
|
|
| `Jacaranda/kiswallama-pretrained` | 321M Swahili tokens, custom tokenizer (20K vocab) | Base Swahili adaptation | |
|
|
| Custom Swahili QA corpus | Curated Q&A and summarization samples | Conversational fine-tuning | |
|
|
|
|
|
--- |
|
|
|
|
|
## π§ Model Capabilities |
|
|
|
|
|
β
Text generation in **Swahili and English** |
|
|
β
Instruction-following, summarization, and dialogue |
|
|
β
Question answering and translation (EN β SW) |
|
|
β
Sentiment and named-entity recognition |
|
|
β
Contextually and culturally aligned text generation |
|
|
|
|
|
--- |
|
|
|
|
|
## π Evaluation Metrics |
|
|
|
|
|
| Metric | Score | Description | |
|
|
|---------|-------|-------------| |
|
|
| **BLEU** | 0.49 | Measures fluency and translation accuracy | |
|
|
| **ROUGE-L** | 0.61 | Summarization recall and overlap | |
|
|
| **Accuracy (QA)** | 95.5% | Accuracy on Swahili QA tasks | |
|
|
| **CER** | 0.28 | Character Error Rate | |
|
|
| **F1 (avg)** | 0.90+ | Weighted average across tasks | |
|
|
|
|
|
--- |
|
|
|
|
|
## βοΈ Usage (Python Example) |
|
|
|
|
|
Below is a quick example to load and use **SALAMA LLM** for Swahili text generation: |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
import torch |
|
|
|
|
|
# Load model and tokenizer |
|
|
model_name = "EYEDOL/salama-llm" # Change to your Hugging Face repo name |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
model_name, |
|
|
torch_dtype=torch.bfloat16, |
|
|
device_map="auto" |
|
|
) |
|
|
|
|
|
# Swahili text prompt |
|
|
prompt = "Andika sentensi fupi kuhusu umuhimu wa elimu." |
|
|
|
|
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
|
|
outputs = model.generate( |
|
|
**inputs, |
|
|
max_new_tokens=120, |
|
|
temperature=0.7, |
|
|
top_p=0.9, |
|
|
repetition_penalty=1.05 |
|
|
) |
|
|
|
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
**𦩠Example Output:** |
|
|
|
|
|
> βElimu ni msingi wa maendeleo, humwezesha mtu kuelewa dunia na kuboresha maisha yake na jamii kwa ujumla.β |
|
|
|
|
|
--- |
|
|
|
|
|
## β‘ Key Features |
|
|
|
|
|
- π§© Optimized for African low-resource NLP contexts |
|
|
- π¬ Instruction-following in Swahili and English |
|
|
- βοΈ Lightweight and efficient (QLoRA fine-tuned; runs on single 24 GB GPU) |
|
|
- π Culturally aligned text generation |
|
|
- π¦Ά Open-source and extendable to other African languages |
|
|
|
|
|
--- |
|
|
|
|
|
## π« Limitations |
|
|
|
|
|
- β οΈ May underperform with heavy code-switching (Swahili-English mix) |
|
|
- π€ Not yet optimized for rare dialects or poetic forms |
|
|
- π Limited exposure to specialized (medical/legal) corpora |
|
|
- π Relies on accurate STT transcription in end-to-end speech-to-speech use |
|
|
|
|
|
--- |
|
|
|
|
|
## π Related Models |
|
|
|
|
|
| Model | Description | |
|
|
|--------|-------------| |
|
|
| [`EYEDOL/salama-stt`](https://huggingface.co/EYEDOL/salama-stt) | Swahili Speech-to-Text model (Whisper-small fine-tuned) | |
|
|
| [`EYEDOL/salama-tts`](https://huggingface.co/EYEDOL/salama-tts) | Swahili Text-to-Speech model (VITS architecture) | |
|
|
|
|
|
--- |
|
|
|
|
|
## π§Ύ Citation |
|
|
|
|
|
If you use **SALAMA LLM**, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{salama_llm_2025, |
|
|
title={SALAMA LLM: Swahili Instruction-Tuned Text Generation Model}, |
|
|
author={AI4NNOV}, |
|
|
year={2025}, |
|
|
publisher={Hugging Face}, |
|
|
howpublished={\url{https://huggingface.co/EYEDOL/salama-llm}} |
|
|
} |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
**π‘ βElimu ni msingi wa maendeleo β Knowledge is the foundation of progress.β** |
|
|
|
|
|
|