Model Card for mT5-Base SQuAD Transfer (Stage 1)

Model Summary

This model is an intermediate research checkpoint developed as part of the KenSwQuAD project (Hierarchical Curriculum Learning for Swahili QA).

It consists of a google/mt5-base model that has been fine-tuned on the English SQuAD v2 dataset. The purpose of this model is to serve as a "Structure-Aware" baseline. By learning the mechanics of Question Answering (identifying query-response relationships) in a high-resource language (English), this model effectively learns the task of QA before being adapted to the language of Swahili in subsequent training stages.

This is Stage 1 of a 3-Stage Pipeline:

Stage 1 (Current): Structural Transfer (English SQuAD) -> Learns "How to Answer"
Stage 2: Morphological Alignment (Extractive KenSwQuAD) -> Learns Swahili Syntax
Stage 3: Generative Refinement (Full KenSwQuAD + Scaffolding) -> Learns Reasoning

Model Details

Developed by: Benjamin Kikwai
Model Type: Multilingual Sequence-to-Sequence (Encoder-Decoder)
Base Model: google/mt5-base
Language(s): Pre-trained on 101 languages (mC4); Fine-tuned on English.
Task: Generative Question Answering (Text-to-Text).
License: Apache 2.0

Intended Use

This model is primarily intended for Transfer Learning experiments. It serves as a better initialization point for Multilingual QA tasks than the raw mt5-base checkpoint.

How to Use

The model accepts input in the format: question: <question_text> context: <context_text>

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model_name = "kikwaib/mt5-base-squad-transfer"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

context = "The Normans (Norman: Nourmands; French: Normands; Latin: Normanni) were the people who in the 10th and 11th centuries gave their name to Normandy, a region in France."
question = "When were the Normans in Normandy?"

input_text = f"question: {question} context: {context}"
inputs = tokenizer(input_text, return_tensors="pt")

outputs = model.generate(**inputs, max_length=32)
answer = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(answer)
# Expected Output: "10th and 11th centuries"

Training Data

The model was fine-tuned on SQuAD v2 (Stanford Question Answering Dataset).

Preprocessing Note: To align with the KenSwQuAD dataset (which contains only answerable questions), this model was trained only on the answerable subset of SQuAD v2. Unanswerable questions (where the answer list is empty) were filtered out during preprocessing to prevent the model from learning to generate empty strings or "unanswerable" tokens.

Training Procedure

The training was conducted in a Google Colab environment using Hugging Face Transformers.

Hyperparameters

Learning Rate: 1e-4
Train Batch Size: 8
Eval Batch Size: 8
Gradient Accumulation Steps: 2
Effective Batch Size: 16
Num Epochs: 2
Optimizer: AdamW (fused) with betas=(0.9, 0.999) and epsilon=1e-08
LR Scheduler: Linear
Seed: 42
Max Input Length: 512 tokens

Training Results

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	RougeL	RougeLsum
0.2473	1.0	5427	0.4609	81.6473	43.3537	81.665	81.7141
0.3451	2.0	10854	0.3712	83.1882	44.8183	83.2252	83.2484

Environmental Impact

Hardware: NVIDIA T4 GPU
Compute Time: ~3 hours

Evaluation Results

The model was evaluated on the SQuAD v2 validation set (answerable subset).

Metric	Score	Interpretation
ROUGE-L	83.23	High structural overlap with ground truth.
ROUGE-1	83.19	Excellent keyword retention.
ROUGE-2	44.82	Strong bigram overlap.
ROUGE-Lsum	83.25	Consistent summary-level performance.
Validation Loss	0.37	Strong convergence without overfitting.

These scores indicate that the model has successfully learned to extract and generate spans of text relevant to questions, verifying its readiness for cross-lingual transfer to Swahili.

Framework Versions

Transformers: 4.57.3
PyTorch: 2.9.0+cu126
Datasets: 4.0.0
Tokenizers: 0.22.1

Citation

Downloads last month: 5

Safetensors

Model size

0.6B params

Tensor type

F32

Model tree for kikwaib/mt5-base-squad-transfer

Base model

google/mt5-base

Finetuned

(304)

this model

Finetunes

1 model

Dataset used to train kikwaib/mt5-base-squad-transfer

Evaluation results

ROUGE-L on SQuAD v2
self-reported

83.220