Model Card for mT5-Base SQuAD Transfer (Stage 1)

Model Summary

This model is an intermediate research checkpoint developed as part of the KenSwQuAD project (Hierarchical Curriculum Learning for Swahili QA).

It consists of a google/mt5-base model that has been fine-tuned on the English SQuAD v2 dataset. The purpose of this model is to serve as a "Structure-Aware" baseline. By learning the mechanics of Question Answering (identifying query-response relationships) in a high-resource language (English), this model effectively learns the task of QA before being adapted to the language of Swahili in subsequent training stages.

This is Stage 1 of a 3-Stage Pipeline:

  1. Stage 1 (Current): Structural Transfer (English SQuAD) -> Learns "How to Answer"
  2. Stage 2: Morphological Alignment (Extractive KenSwQuAD) -> Learns Swahili Syntax
  3. Stage 3: Generative Refinement (Full KenSwQuAD + Scaffolding) -> Learns Reasoning

Model Details

  • Developed by: Benjamin Kikwai
  • Model Type: Multilingual Sequence-to-Sequence (Encoder-Decoder)
  • Base Model: google/mt5-base
  • Language(s): Pre-trained on 101 languages (mC4); Fine-tuned on English.
  • Task: Generative Question Answering (Text-to-Text).
  • License: Apache 2.0

Intended Use

This model is primarily intended for Transfer Learning experiments. It serves as a better initialization point for Multilingual QA tasks than the raw mt5-base checkpoint.

How to Use

The model accepts input in the format: question: <question_text> context: <context_text>

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model_name = "kikwaib/mt5-base-squad-transfer"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

context = "The Normans (Norman: Nourmands; French: Normands; Latin: Normanni) were the people who in the 10th and 11th centuries gave their name to Normandy, a region in France."
question = "When were the Normans in Normandy?"

input_text = f"question: {question} context: {context}"
inputs = tokenizer(input_text, return_tensors="pt")

outputs = model.generate(**inputs, max_length=32)
answer = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(answer)
# Expected Output: "10th and 11th centuries"

Training Data

The model was fine-tuned on SQuAD v2 (Stanford Question Answering Dataset).

Preprocessing Note: To align with the KenSwQuAD dataset (which contains only answerable questions), this model was trained only on the answerable subset of SQuAD v2. Unanswerable questions (where the answer list is empty) were filtered out during preprocessing to prevent the model from learning to generate empty strings or "unanswerable" tokens.

Training Procedure

The training was conducted in a Google Colab environment using Hugging Face Transformers.

Hyperparameters

  • Learning Rate: 1e-4
  • Train Batch Size: 8
  • Eval Batch Size: 8
  • Gradient Accumulation Steps: 2
  • Effective Batch Size: 16
  • Num Epochs: 2
  • Optimizer: AdamW (fused) with betas=(0.9, 0.999) and epsilon=1e-08
  • LR Scheduler: Linear
  • Seed: 42
  • Max Input Length: 512 tokens

Training Results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 RougeL RougeLsum
0.2473 1.0 5427 0.4609 81.6473 43.3537 81.665 81.7141
0.3451 2.0 10854 0.3712 83.1882 44.8183 83.2252 83.2484

Environmental Impact

  • Hardware: NVIDIA T4 GPU
  • Compute Time: ~3 hours

Evaluation Results

The model was evaluated on the SQuAD v2 validation set (answerable subset).

Metric Score Interpretation
ROUGE-L 83.23 High structural overlap with ground truth.
ROUGE-1 83.19 Excellent keyword retention.
ROUGE-2 44.82 Strong bigram overlap.
ROUGE-Lsum 83.25 Consistent summary-level performance.
Validation Loss 0.37 Strong convergence without overfitting.

These scores indicate that the model has successfully learned to extract and generate spans of text relevant to questions, verifying its readiness for cross-lingual transfer to Swahili.

Framework Versions

  • Transformers: 4.57.3
  • PyTorch: 2.9.0+cu126
  • Datasets: 4.0.0
  • Tokenizers: 0.22.1

Citation

Downloads last month
37
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kikwaib/mt5-base-squad-transfer

Base model

google/mt5-base
Finetuned
(281)
this model
Finetunes
1 model

Dataset used to train kikwaib/mt5-base-squad-transfer

Evaluation results