omarbayoumi2
/

bert-base-qa-squad-colab

@@ -9,45 +9,107 @@ model_name: bert-base-qa-squad-colab
 tags:
   - bert
   - extractive-question-answering
   - fine-tuned
 ---
-# 🧠 BERT-base SQuAD v1.1 Question Answering Model
-This model is a fine-tuned version of **bert-base-uncased** trained on the **SQuAD v1.1 dataset** for *extractive question answering*.
-Given a passage and a question, the model identifies the most likely answer span within the context.
 ---
-## 📌 Model Details
-| Property | Description |
-|---------|-------------|
-| Base Model | `bert-base-uncased` |
-| Task Type | Extractive Question Answering |
-| Parameters | ~110M |
-| Framework | Transformers (PyTorch) |
-| Dataset | SQuAD v1.1 (`rajpurkar/squad`) |
-| Language | English |
-| Fine-tuned on | Google Colab GPU |
-| Author | `omarbayoumi2` |
 ---
-## 🧪 Evaluation
-The model was evaluated using standard QA metrics:
 | Metric | Score (approx) |
-|--------|---------------|
-| Exact Match (EM) | 66–80% |
-| F1 Score | 77–88% |
-> Scores depend on number of samples used during training.
 ---
-## 🚀 Usage
 ```python
 from transformers import pipeline
@@ -57,7 +119,9 @@ qa = pipeline(
     model="omarbayoumi2/bert-base-qa-squad-colab",
 )
-context = "BERT was developed by researchers at Google."
 question = "Who developed BERT?"
-print(qa(question=question, context=context))

 tags:
   - bert
   - extractive-question-answering
+  - question-answering
+  - squad
   - fine-tuned
+widget:
+  - example_title: Simple QA example
+    context: >
+      BERT is a language representation model developed by researchers at Google.
+      It achieved strong performance on many natural language understanding benchmarks.
+    question: Who developed BERT?
+---
+# 🌟 BERT-base SQuAD v1.1 Question Answering Model
+> Fine-tuned **bert-base-uncased** on **SQuAD v1.1** for English *extractive question answering*.
+[![Model](https://img.shields.io/badge/Model-bert--base--qa--squad--colab-blue)](https://huggingface.co/omarbayoumi2/bert-base-qa-squad-colab)
+[![Space](https://img.shields.io/badge/Space-bert--qa--demo-yellow)](https://huggingface.co/spaces/omarbayoumi2/bert-qa-demo)
+[![Transformers](https://img.shields.io/badge/🤗-Transformers-black)](https://huggingface.co/docs/transformers/index)
+This model takes a **context paragraph** and a **question** and predicts the most likely **answer span** inside the context.
+It is intended as a compact, educational example of fine-tuning a smol LM (~110M parameters) for QA on Google Colab and deploying it on Hugging Face.
+---
+## 🔎 Use Cases
+- Educational demos of **extractive question answering**
+- Small QA systems over short English paragraphs
+- Teaching / learning how to:
+  - preprocess SQuAD-style datasets
+  - fine-tune BERT with `Trainer`
+  - publish models + Spaces on Hugging Face
+> ⚠️ Not intended for production-critical use (medical/legal/financial advice, etc.).
 ---
+## 🧠 Model Details
+- **Base model:** [`bert-base-uncased`](https://huggingface.co/bert-base-uncased)
+- **Architecture:** Encoder-only Transformer with QA span head (start/end logits)
+- **Parameters:** ~110M
+- **Task:** Extractive Question Answering
+- **Language:** English
+- **Author:** `omarbayoumi2`
+- **Training platform:** Google Colab (free GPU)
 ---
+## 📚 Training Data
+- **Dataset:** [SQuAD v1.1](https://huggingface.co/datasets/rajpurkar/squad) (`rajpurkar/squad`)
+- **Train split:** ~87k question–answer pairs
+- **Validation split:** ~10k question–answer pairs
+- **Domain:** Wikipedia articles (encyclopedic text)
+Each example provides:
+- `context`: paragraph
+- `question`: question string
+- `answers`: list with `text` and `answer_start` (character offset)
 ---
+## ⚙️ Training Configuration
+Fine-tuning was performed with the Hugging Face **Trainer** API.
+- **Optimizer:** AdamW (via `Trainer`)
+- **Epochs:** 2
+- **Learning rate:** 3e-5
+- **Batch size:** 8 (train), 16 (eval)
+- **Max sequence length:** 384
+- **Doc stride:** 128
+- **Weight decay:** 0.01
+- **Mixed precision:** FP16 (when GPU supports it)
+Loss is computed as cross-entropy over the start and end token positions.
+---
+## 📊 Evaluation
+Evaluation was performed on the SQuAD v1.1 validation split using the standard SQuAD metric:
+- **Exact Match (EM):** measures whether predicted span matches ground truth span exactly
+- **F1 score:** token-level overlap between predicted and true answer text
+Typical results for this setup are:
 | Metric | Score (approx) |
+|--------|----------------|
+| Exact Match | 66–80% |
+| F1         | 77–88% |
+Scores may vary depending on number of training examples, epochs and random seed.
 ---
+## 🚀 How to Use
+### 1. With Transformers `pipeline`
 ```python
 from transformers import pipeline
     model="omarbayoumi2/bert-base-qa-squad-colab",
 )
+context = "BERT is a language representation model developed by researchers at Google."
 question = "Who developed BERT?"
+result = qa(question=question, context=context)
+print(result)
+# {'score': ..., 'start': ..., 'end': ..., 'answer': 'researchers at Google'}