kikwaib
/

mt5-base-squad-transfer

@@ -1,67 +1,137 @@
 ---
-library_name: transformers
 license: apache-2.0
-base_model: google/mt5-base
 tags:
-- generated_from_trainer
 metrics:
 - rouge
 model-index:
 - name: mt5-base-squad-transfer
-  results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# mt5-base-squad-transfer
-This model is a fine-tuned version of [google/mt5-base](https://huggingface.co/google/mt5-base) on an unknown dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.3712
-- Rouge1: 83.1882
-- Rouge2: 44.8183
-- Rougel: 83.2252
-- Rougelsum: 83.2484
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 0.0001
-- train_batch_size: 8
-- eval_batch_size: 8
-- seed: 42
-- gradient_accumulation_steps: 2
-- total_train_batch_size: 16
-- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
-- lr_scheduler_type: linear
-- num_epochs: 2
-### Training results
-| Training Loss | Epoch | Step  | Validation Loss | Rouge1  | Rouge2  | Rougel  | Rougelsum |
 |:-------------:|:-----:|:-----:|:---------------:|:-------:|:-------:|:-------:|:---------:|
 | 0.2473        | 1.0   | 5427  | 0.4609          | 81.6473 | 43.3537 | 81.665  | 81.7141   |
 | 0.3451        | 2.0   | 10854 | 0.3712          | 83.1882 | 44.8183 | 83.2252 | 83.2484   |
-### Framework versions
-- Transformers 4.57.3
-- Pytorch 2.9.0+cu126
-- Datasets 4.0.0
-- Tokenizers 0.22.1

 ---
+language:
+- en
+- sw
+- multilingual
 license: apache-2.0
 tags:
+- question-answering
+- seq2seq
+- curriculum-learning
+- mt5
+- low-resource-nlp
+datasets:
+- rajpurkar/squad_v2
 metrics:
 - rouge
+base_model: google/mt5-base
 model-index:
 - name: mt5-base-squad-transfer
+  results:
+  - task:
+      type: question-answering
+      name: Question Answering
+    dataset:
+      name: SQuAD v2
+      type: rajpurkar/squad_v2
+    metrics:
+    - name: ROUGE-L
+      type: rouge
+      value: 83.22
 ---
+# Model Card for mT5-Base SQuAD Transfer (Stage 1)
+## Model Summary
+This model is an **intermediate research checkpoint** developed as part of the **KenSwQuAD** project (Hierarchical Curriculum Learning for Swahili QA).
+It consists of a `google/mt5-base` model that has been fine-tuned on the **English SQuAD v2 dataset**. The purpose of this model is to serve as a "Structure-Aware" baseline. By learning the mechanics of Question Answering (identifying query-response relationships) in a high-resource language (English), this model effectively learns the *task* of QA before being adapted to the *language* of Swahili in subsequent training stages.
+**This is Stage 1 of a 3-Stage Pipeline:**
+1.  **Stage 1 (Current):** Structural Transfer (English SQuAD) -> *Learns "How to Answer"*
+2.  **Stage 2:** Morphological Alignment (Extractive KenSwQuAD) -> *Learns Swahili Syntax*
+3.  **Stage 3:** Generative Refinement (Full KenSwQuAD + Scaffolding) -> *Learns Reasoning*
+## Model Details
+- **Developed by:** Benjamin Kikwai
+- **Model Type:** Multilingual Sequence-to-Sequence (Encoder-Decoder)
+- **Base Model:** [google/mt5-base](https://huggingface.co/google/mt5-base)
+- **Language(s):** Pre-trained on 101 languages (mC4); Fine-tuned on English.
+- **Task:** Generative Question Answering (Text-to-Text).
+- **License:** Apache 2.0
+## Intended Use
+This model is primarily intended for **Transfer Learning** experiments. It serves as a better initialization point for Multilingual QA tasks than the raw `mt5-base` checkpoint.
+### How to Use
+The model accepts input in the format: `question: <question_text> context: <context_text>`
+```python
+from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
+model_name = "kikwaib/mt5-base-squad-transfer"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
+context = "The Normans (Norman: Nourmands; French: Normands; Latin: Normanni) were the people who in the 10th and 11th centuries gave their name to Normandy, a region in France."
+question = "When were the Normans in Normandy?"
+input_text = f"question: {question} context: {context}"
+inputs = tokenizer(input_text, return_tensors="pt")
+outputs = model.generate(**inputs, max_length=32)
+answer = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(answer)
+# Expected Output: "10th and 11th centuries"
+```
+## Training Data
+The model was fine-tuned on **SQuAD v2 (Stanford Question Answering Dataset)**.
+**Preprocessing Note:**
+To align with the KenSwQuAD dataset (which contains only answerable questions), this model was trained **only on the answerable subset** of SQuAD v2. Unanswerable questions (where the answer list is empty) were filtered out during preprocessing to prevent the model from learning to generate empty strings or "unanswerable" tokens.
+## Training Procedure
+The training was conducted in a Google Colab environment using Hugging Face Transformers.
+### Hyperparameters
+- **Learning Rate:** 1e-4
+- **Train Batch Size:** 8
+- **Eval Batch Size:** 8
+- **Gradient Accumulation Steps:** 2
+- **Effective Batch Size:** 16
+- **Num Epochs:** 2
+- **Optimizer:** AdamW (fused) with betas=(0.9, 0.999) and epsilon=1e-08
+- **LR Scheduler:** Linear
+- **Seed:** 42
+- **Max Input Length:** 512 tokens
+### Training Results
+| Training Loss | Epoch | Step  | Validation Loss | Rouge1  | Rouge2  | RougeL  | RougeLsum |
 |:-------------:|:-----:|:-----:|:---------------:|:-------:|:-------:|:-------:|:---------:|
 | 0.2473        | 1.0   | 5427  | 0.4609          | 81.6473 | 43.3537 | 81.665  | 81.7141   |
 | 0.3451        | 2.0   | 10854 | 0.3712          | 83.1882 | 44.8183 | 83.2252 | 83.2484   |
+### Environmental Impact
+- **Hardware:** NVIDIA T4 GPU
+- **Compute Time:** ~3 hours
+## Evaluation Results
+The model was evaluated on the SQuAD v2 validation set (answerable subset).
+| Metric | Score | Interpretation |
+| :--- | :--- | :--- |
+| **ROUGE-L** | **83.23** | High structural overlap with ground truth. |
+| **ROUGE-1** | 83.19 | Excellent keyword retention. |
+| **ROUGE-2** | 44.82 | Strong bigram overlap. |
+| **ROUGE-Lsum** | 83.25 | Consistent summary-level performance. |
+| **Validation Loss**| 0.37 | Strong convergence without overfitting. |
+These scores indicate that the model has successfully learned to extract and generate spans of text relevant to questions, verifying its readiness for cross-lingual transfer to Swahili.
+## Framework Versions
+- **Transformers:** 4.57.3
+- **PyTorch:** 2.9.0+cu126
+- **Datasets:** 4.0.0
+- **Tokenizers:** 0.22.1
+## Citation