mT5-base-KenSwQuAD-Abstractive
The final model in a 3-stage Hierarchical Curriculum Learning pipeline for Swahili Question Answering.
This model generates abstractive answers to questions about Swahili text, using Context Scaffolding technique with anchor tokens to improve answer localization.
Model Description
| Property | Value |
|---|---|
| Base Architecture | mT5-base (Google) |
| Parameters | 582.4M |
| Vocabulary | 250,150 tokens (50 custom anchor tokens) |
| Languages | Swahili (sw), English (en) |
| Task | Abstractive Question Answering |
| License | Apache 2.0 |
Pipeline Position
This model is Stage 3 of a curriculum learning pipeline:
| Stage | Model | Task | Status |
|---|---|---|---|
| 1 | mt5-base-squad-transfer | English SQuAD Transfer | โ |
| 2 | mt5-base-kenswquad-extractive | Swahili Extractive QA | โ |
| 3 | mt5-base-kenswquad-abstractive | Swahili Abstractive QA | โ |
Performance
Evaluation Results
| Metric | Value | Description |
|---|---|---|
| BLEU | 15.52 | N-gram overlap with reference |
| BERTScore F1 | 0.7721 (77.21%) | Semantic similarity |
| Eval Loss | 2.4184 | Cross-entropy loss |
Training Progress
| Epoch | Train Loss | Val Loss | BLEU | BERTScore F1 |
|---|---|---|---|---|
| 1 | 3.033 | 2.596 | 13.37 | 0.757 |
| 2 | 2.614 | 2.485 | 13.36 | 0.769 |
| 3 | 2.363 | 2.452 | 15.68 | 0.768 |
| 4 | 2.094 | 2.455 | 14.72 | 0.767 |
| 5 | 1.917 | 2.418 | 15.52 | 0.772 โญ |
| 6 | 2.046 | 2.398 | 15.03 | 0.770 |
| 7 | 1.881 | 2.435 | 14.95 | 0.770 |
| 8 | 1.728 | 2.400 | 16.25 | 0.769 |
Best model saved at Epoch 5 (highest BERTScore F1)
Usage
Basic Usage
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
# Load model and tokenizer (MUST be from same checkpoint!)
tokenizer = AutoTokenizer.from_pretrained("kikwaib/mt5-base-kenswquad-abstractive")
model = AutoModelForSeq2SeqLM.from_pretrained("kikwaib/mt5-base-kenswquad-abstractive")
# Simple question answering
question = "Nani aliandika kitabu hiki?"
context = "Kitabu hiki kiliandikwa na Ngugi wa Thiong'o. Kilitolewa mwaka 1967."
input_text = f"question: {question} context: {context}"
inputs = tokenizer(input_text, return_tensors="pt", max_length=1024, truncation=True)
outputs = model.generate(**inputs, max_length=128)
answer = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(answer) # "Ngugi wa Thiong'o"
With Context Scaffolding (Recommended)
For optimal performance, use anchor tokens to mark paragraph boundaries:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("kikwaib/mt5-base-kenswquad-abstractive")
model = AutoModelForSeq2SeqLM.from_pretrained("kikwaib/mt5-base-kenswquad-abstractive")
# Apply scaffolding with <p1>, <p2>, ... tokens
question = "Nani aliandika kitabu hiki?"
context = "<p1> Kitabu hiki kiliandikwa na Ngugi wa Thiong'o. <p2> Kilitolewa mwaka 1967."
# Add pointer hint if you know which paragraph contains the answer
input_text = f"question: {question} context: {context} <pointer> <p1>"
inputs = tokenizer(input_text, return_tensors="pt", max_length=1024, truncation=True)
outputs = model.generate(**inputs, max_length=128)
answer = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(answer) # "Ngugi wa Thiong'o"
Scaffolding Helper Function
import re
def apply_scaffolding(context, question, paragraph_pointer=None):
"""
Transform raw context into scaffolded format with anchor tokens.
Args:
context: Raw text (paragraphs separated by newlines)
question: The question to answer
paragraph_pointer: Optional paragraph number (1-indexed) containing the answer
Returns:
Scaffolded input string
"""
# Split into paragraphs
paragraphs = [p.strip() for p in re.split(r'\n+', context) if p.strip()]
# Add anchor tokens
scaffolded = ""
for i, para in enumerate(paragraphs):
p_num = i + 1
if p_num < 50:
scaffolded += f"<p{p_num}> {para} "
else:
scaffolded += f"{para} "
# Add pointer hint
hint = ""
if paragraph_pointer and 1 <= int(paragraph_pointer) < 50:
hint = f" <pointer> <p{paragraph_pointer}>"
return f"question: {question} context: {scaffolded}{hint}"
# Usage
input_text = apply_scaffolding(
context="First paragraph.\nSecond paragraph with answer.",
question="What is in the second paragraph?",
paragraph_pointer=2
)
Pipeline Usage
from transformers import pipeline
qa_pipeline = pipeline(
"question-answering",
model="kikwaib/mt5-base-kenswquad-abstractive",
tokenizer="kikwaib/mt5-base-kenswquad-abstractive"
)
result = qa_pipeline("question: Nani aliandika kitabu? context: <p1> Ngugi aliandika kitabu hiki.")
print(result[0]['generated_text'])
Context Scaffolding
What is Context Scaffolding?
Context Scaffolding is a technique that structures input text with special anchor tokens:
- Paragraph Anchors (
<p1>,<p2>, ...,<p49>): Mark paragraph boundaries - Pointer Token (
<pointer>): Provides a late contextual hint about which paragraph contains relevant information
Why Use Scaffolding?
- Helps the model locate information in long contexts
- Provides attention guidance to relevant paragraphs
- Improves performance on multi-paragraph documents
Example Transformation
Without Scaffolding:
question: What is the capital? context: Kenya is in East Africa. Nairobi is the capital city.
With Scaffolding:
question: What is the capital? context: <p1> Kenya is in East Africa. <p2> Nairobi is the capital city. <pointer> <p2>
Training Details
Training Data
| Dataset | Samples | Description |
|---|---|---|
| KenSwQuAD (Abstractive) | 2,428 | QA pairs where answer requires reasoning |
| Train Split | 2,185 (90%) | Training samples |
| Test Split | 243 (10%) | Evaluation samples |
Hyperparameters
| Parameter | Value |
|---|---|
| Learning Rate | 5e-5 |
| Batch Size | 8 |
| Max Epochs | 15 |
| Early Stopping Patience | 3 |
| Max Input Length | 1024 tokens |
| Max Target Length | 128 tokens |
| Weight Decay | 0.01 |
| Max Gradient Norm | 1.0 |
| Optimizer | AdamW |
| Best Model Metric | BERTScore F1 |
Training Infrastructure
| Component | Specification |
|---|---|
| GPU | NVIDIA A100-SXM4-40GB |
| GPU Memory | 42.5 GB |
| Platform | Google Colab |
| Training Time | 29.7 minutes |
| Epochs Completed | 8 (early stopped) |
Framework Versions
| Package | Version |
|---|---|
| Transformers | 4.57.3 |
| Datasets | 4.0.0 |
| PyTorch | 2.9.0+cu126 |
| CUDA | 12.6 |
Sample Predictions
| Question | Ground Truth | Prediction | Analysis |
|---|---|---|---|
| Ni vipi uwekezaji utaimarishwa | Kwa mikopo na nafasi ya kufanyia kazi | Kwa kuwapa wawekezaji mikopo na nafasi za ajira | โ Semantically equivalent |
| Wahispania walikuja na ugonjwa gani | nduli | ndui | โ ๏ธ Close variant |
| Kipi kinachangia wanandoa kutalakiana | Kukosa kusajili ndoa zao rasmi | Kwa kukosa kusajili ndoa zao | โ Core meaning preserved |
Limitations
- Tokenizer Dependency: Must load tokenizer from this checkpoint (has 50 custom tokens)
- Context Length: Inputs longer than 1024 tokens are truncated
- Scaffolding Preference: Best performance requires anchor token formatting
- Dataset Size: Trained on relatively small abstractive dataset (2,428 samples)
- Swahili Variants: May struggle with dialectal differences
Ethical Considerations
- Intended Use: Research and educational applications for Swahili NLP
- Not Intended: Production systems without additional validation
- Bias: May reflect biases present in KenSwQuAD training data
- Verification: Answers should be verified for critical applications
Citation
If you use this model, please cite:
@misc{mt5-kenswquad-abstractive,
author = {Kikwai, B.},
title = {mT5-base-KenSwQuAD-Abstractive: Hierarchical Curriculum Learning for Swahili QA},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/kikwaib/mt5-base-kenswquad-abstractive}
}
Related Models
| Model | Task | Link |
|---|---|---|
| mt5-base-squad-transfer | Stage 1: English QA Transfer | ๐ |
| mt5-base-kenswquad-extractive | Stage 2: Swahili Extractive QA | ๐ |
| mt5-base-kenswquad-abstractive | Stage 3: Swahili Abstractive QA | This model |
Acknowledgments
- KenSwQuAD Dataset: For providing Swahili QA pairs
- Google: For the mT5 base model
- Hugging Face: For the Transformers library and model hosting
Model trained: December 20, 2025 | Last updated: December 20, 2025
- Downloads last month
- 4
Model tree for kikwaib/mt5-base-kenswquad-abstractive
Base model
google/mt5-base
Finetuned
kikwaib/mt5-base-squad-transfer
Finetuned
kikwaib/mt5-base-kenswquad-extractive Space using kikwaib/mt5-base-kenswquad-abstractive 1
Evaluation results
- BLEU on KenSwQuAD (Abstractive Subset)test set self-reported15.520
- BERTScore F1 on KenSwQuAD (Abstractive Subset)test set self-reported0.772