Model Card for mT5-Base SQuAD Transfer (Stage 1)
Model Summary
This model is an intermediate research checkpoint developed as part of the KenSwQuAD project (Hierarchical Curriculum Learning for Swahili QA).
It consists of a google/mt5-base model that has been fine-tuned on the English SQuAD v2 dataset. The purpose of this model is to serve as a "Structure-Aware" baseline. By learning the mechanics of Question Answering (identifying query-response relationships) in a high-resource language (English), this model effectively learns the task of QA before being adapted to the language of Swahili in subsequent training stages.
This is Stage 1 of a 3-Stage Pipeline:
- Stage 1 (Current): Structural Transfer (English SQuAD) -> Learns "How to Answer"
- Stage 2: Morphological Alignment (Extractive KenSwQuAD) -> Learns Swahili Syntax
- Stage 3: Generative Refinement (Full KenSwQuAD + Scaffolding) -> Learns Reasoning
Model Details
- Developed by: Benjamin Kikwai
- Model Type: Multilingual Sequence-to-Sequence (Encoder-Decoder)
- Base Model: google/mt5-base
- Language(s): Pre-trained on 101 languages (mC4); Fine-tuned on English.
- Task: Generative Question Answering (Text-to-Text).
- License: Apache 2.0
Intended Use
This model is primarily intended for Transfer Learning experiments. It serves as a better initialization point for Multilingual QA tasks than the raw mt5-base checkpoint.
How to Use
The model accepts input in the format: question: <question_text> context: <context_text>
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
model_name = "kikwaib/mt5-base-squad-transfer"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
context = "The Normans (Norman: Nourmands; French: Normands; Latin: Normanni) were the people who in the 10th and 11th centuries gave their name to Normandy, a region in France."
question = "When were the Normans in Normandy?"
input_text = f"question: {question} context: {context}"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=32)
answer = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(answer)
# Expected Output: "10th and 11th centuries"
Training Data
The model was fine-tuned on SQuAD v2 (Stanford Question Answering Dataset).
Preprocessing Note: To align with the KenSwQuAD dataset (which contains only answerable questions), this model was trained only on the answerable subset of SQuAD v2. Unanswerable questions (where the answer list is empty) were filtered out during preprocessing to prevent the model from learning to generate empty strings or "unanswerable" tokens.
Training Procedure
The training was conducted in a Google Colab environment using Hugging Face Transformers.
Hyperparameters
- Learning Rate: 1e-4
- Train Batch Size: 8
- Eval Batch Size: 8
- Gradient Accumulation Steps: 2
- Effective Batch Size: 16
- Num Epochs: 2
- Optimizer: AdamW (fused) with betas=(0.9, 0.999) and epsilon=1e-08
- LR Scheduler: Linear
- Seed: 42
- Max Input Length: 512 tokens
Training Results
| Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | RougeL | RougeLsum |
|---|---|---|---|---|---|---|---|
| 0.2473 | 1.0 | 5427 | 0.4609 | 81.6473 | 43.3537 | 81.665 | 81.7141 |
| 0.3451 | 2.0 | 10854 | 0.3712 | 83.1882 | 44.8183 | 83.2252 | 83.2484 |
Environmental Impact
- Hardware: NVIDIA T4 GPU
- Compute Time: ~3 hours
Evaluation Results
The model was evaluated on the SQuAD v2 validation set (answerable subset).
| Metric | Score | Interpretation |
|---|---|---|
| ROUGE-L | 83.23 | High structural overlap with ground truth. |
| ROUGE-1 | 83.19 | Excellent keyword retention. |
| ROUGE-2 | 44.82 | Strong bigram overlap. |
| ROUGE-Lsum | 83.25 | Consistent summary-level performance. |
| Validation Loss | 0.37 | Strong convergence without overfitting. |
These scores indicate that the model has successfully learned to extract and generate spans of text relevant to questions, verifying its readiness for cross-lingual transfer to Swahili.
Framework Versions
- Transformers: 4.57.3
- PyTorch: 2.9.0+cu126
- Datasets: 4.0.0
- Tokenizers: 0.22.1
Citation
- Downloads last month
- 37
Model tree for kikwaib/mt5-base-squad-transfer
Dataset used to train kikwaib/mt5-base-squad-transfer
Evaluation results
- ROUGE-L on SQuAD v2self-reported83.220