mT5-base-KenSwQuAD-Abstractive

The final model in a 3-stage Hierarchical Curriculum Learning pipeline for Swahili Question Answering.

This model generates abstractive answers to questions about Swahili text, using Context Scaffolding technique with anchor tokens to improve answer localization.

Model Description

Property Value
Base Architecture mT5-base (Google)
Parameters 582.4M
Vocabulary 250,150 tokens (50 custom anchor tokens)
Languages Swahili (sw), English (en)
Task Abstractive Question Answering
License Apache 2.0

Pipeline Position

This model is Stage 3 of a curriculum learning pipeline:

Stage Model Task Status
1 mt5-base-squad-transfer English SQuAD Transfer โœ…
2 mt5-base-kenswquad-extractive Swahili Extractive QA โœ…
3 mt5-base-kenswquad-abstractive Swahili Abstractive QA โœ…

Performance

Evaluation Results

Metric Value Description
BLEU 15.52 N-gram overlap with reference
BERTScore F1 0.7721 (77.21%) Semantic similarity
Eval Loss 2.4184 Cross-entropy loss

Training Progress

Epoch Train Loss Val Loss BLEU BERTScore F1
1 3.033 2.596 13.37 0.757
2 2.614 2.485 13.36 0.769
3 2.363 2.452 15.68 0.768
4 2.094 2.455 14.72 0.767
5 1.917 2.418 15.52 0.772 โญ
6 2.046 2.398 15.03 0.770
7 1.881 2.435 14.95 0.770
8 1.728 2.400 16.25 0.769

Best model saved at Epoch 5 (highest BERTScore F1)

Usage

Basic Usage

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

# Load model and tokenizer (MUST be from same checkpoint!)
tokenizer = AutoTokenizer.from_pretrained("kikwaib/mt5-base-kenswquad-abstractive")
model = AutoModelForSeq2SeqLM.from_pretrained("kikwaib/mt5-base-kenswquad-abstractive")

# Simple question answering
question = "Nani aliandika kitabu hiki?"
context = "Kitabu hiki kiliandikwa na Ngugi wa Thiong'o. Kilitolewa mwaka 1967."

input_text = f"question: {question} context: {context}"
inputs = tokenizer(input_text, return_tensors="pt", max_length=1024, truncation=True)
outputs = model.generate(**inputs, max_length=128)
answer = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(answer)  # "Ngugi wa Thiong'o"

With Context Scaffolding (Recommended)

For optimal performance, use anchor tokens to mark paragraph boundaries:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("kikwaib/mt5-base-kenswquad-abstractive")
model = AutoModelForSeq2SeqLM.from_pretrained("kikwaib/mt5-base-kenswquad-abstractive")

# Apply scaffolding with <p1>, <p2>, ... tokens
question = "Nani aliandika kitabu hiki?"
context = "<p1> Kitabu hiki kiliandikwa na Ngugi wa Thiong'o. <p2> Kilitolewa mwaka 1967."

# Add pointer hint if you know which paragraph contains the answer
input_text = f"question: {question} context: {context} <pointer> <p1>"

inputs = tokenizer(input_text, return_tensors="pt", max_length=1024, truncation=True)
outputs = model.generate(**inputs, max_length=128)
answer = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(answer)  # "Ngugi wa Thiong'o"

Scaffolding Helper Function

import re

def apply_scaffolding(context, question, paragraph_pointer=None):
    """
    Transform raw context into scaffolded format with anchor tokens.
    
    Args:
        context: Raw text (paragraphs separated by newlines)
        question: The question to answer
        paragraph_pointer: Optional paragraph number (1-indexed) containing the answer
    
    Returns:
        Scaffolded input string
    """
    # Split into paragraphs
    paragraphs = [p.strip() for p in re.split(r'\n+', context) if p.strip()]
    
    # Add anchor tokens
    scaffolded = ""
    for i, para in enumerate(paragraphs):
        p_num = i + 1
        if p_num < 50:
            scaffolded += f"<p{p_num}> {para} "
        else:
            scaffolded += f"{para} "
    
    # Add pointer hint
    hint = ""
    if paragraph_pointer and 1 <= int(paragraph_pointer) < 50:
        hint = f" <pointer> <p{paragraph_pointer}>"
    
    return f"question: {question} context: {scaffolded}{hint}"

# Usage
input_text = apply_scaffolding(
    context="First paragraph.\nSecond paragraph with answer.",
    question="What is in the second paragraph?",
    paragraph_pointer=2
)

Pipeline Usage

from transformers import pipeline

qa_pipeline = pipeline(
    "question-answering",
    model="kikwaib/mt5-base-kenswquad-abstractive",
    tokenizer="kikwaib/mt5-base-kenswquad-abstractive"
)

result = qa_pipeline("question: Nani aliandika kitabu? context: <p1> Ngugi aliandika kitabu hiki.")
print(result[0]['generated_text'])

Context Scaffolding

What is Context Scaffolding?

Context Scaffolding is a technique that structures input text with special anchor tokens:

  1. Paragraph Anchors (<p1>, <p2>, ..., <p49>): Mark paragraph boundaries
  2. Pointer Token (<pointer>): Provides a late contextual hint about which paragraph contains relevant information

Why Use Scaffolding?

  • Helps the model locate information in long contexts
  • Provides attention guidance to relevant paragraphs
  • Improves performance on multi-paragraph documents

Example Transformation

Without Scaffolding:

question: What is the capital? context: Kenya is in East Africa. Nairobi is the capital city.

With Scaffolding:

question: What is the capital? context: <p1> Kenya is in East Africa. <p2> Nairobi is the capital city. <pointer> <p2>

Training Details

Training Data

Dataset Samples Description
KenSwQuAD (Abstractive) 2,428 QA pairs where answer requires reasoning
Train Split 2,185 (90%) Training samples
Test Split 243 (10%) Evaluation samples

Hyperparameters

Parameter Value
Learning Rate 5e-5
Batch Size 8
Max Epochs 15
Early Stopping Patience 3
Max Input Length 1024 tokens
Max Target Length 128 tokens
Weight Decay 0.01
Max Gradient Norm 1.0
Optimizer AdamW
Best Model Metric BERTScore F1

Training Infrastructure

Component Specification
GPU NVIDIA A100-SXM4-40GB
GPU Memory 42.5 GB
Platform Google Colab
Training Time 29.7 minutes
Epochs Completed 8 (early stopped)

Framework Versions

Package Version
Transformers 4.57.3
Datasets 4.0.0
PyTorch 2.9.0+cu126
CUDA 12.6

Sample Predictions

Question Ground Truth Prediction Analysis
Ni vipi uwekezaji utaimarishwa Kwa mikopo na nafasi ya kufanyia kazi Kwa kuwapa wawekezaji mikopo na nafasi za ajira โœ… Semantically equivalent
Wahispania walikuja na ugonjwa gani nduli ndui โš ๏ธ Close variant
Kipi kinachangia wanandoa kutalakiana Kukosa kusajili ndoa zao rasmi Kwa kukosa kusajili ndoa zao โœ… Core meaning preserved

Limitations

  1. Tokenizer Dependency: Must load tokenizer from this checkpoint (has 50 custom tokens)
  2. Context Length: Inputs longer than 1024 tokens are truncated
  3. Scaffolding Preference: Best performance requires anchor token formatting
  4. Dataset Size: Trained on relatively small abstractive dataset (2,428 samples)
  5. Swahili Variants: May struggle with dialectal differences

Ethical Considerations

  • Intended Use: Research and educational applications for Swahili NLP
  • Not Intended: Production systems without additional validation
  • Bias: May reflect biases present in KenSwQuAD training data
  • Verification: Answers should be verified for critical applications

Citation

If you use this model, please cite:

@misc{mt5-kenswquad-abstractive,
  author = {Kikwai, B.},
  title = {mT5-base-KenSwQuAD-Abstractive: Hierarchical Curriculum Learning for Swahili QA},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/kikwaib/mt5-base-kenswquad-abstractive}
}

Related Models

Model Task Link
mt5-base-squad-transfer Stage 1: English QA Transfer ๐Ÿ”—
mt5-base-kenswquad-extractive Stage 2: Swahili Extractive QA ๐Ÿ”—
mt5-base-kenswquad-abstractive Stage 3: Swahili Abstractive QA This model

Acknowledgments

  • KenSwQuAD Dataset: For providing Swahili QA pairs
  • Google: For the mT5 base model
  • Hugging Face: For the Transformers library and model hosting

Model trained: December 20, 2025 | Last updated: December 20, 2025

Downloads last month
4
Safetensors
Model size
0.6B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for kikwaib/mt5-base-kenswquad-abstractive

Base model

google/mt5-base
Finetuned
(1)
this model

Space using kikwaib/mt5-base-kenswquad-abstractive 1

Evaluation results