distilbert-squad-qa / README.md
John-Machado's picture
Create README.md
44dcbcf verified
metadata
license: mit
datasets:
  - rajpurkar/squad
language:
  - en
base_model:
  - distilbert/distilbert-base-uncased
pipeline_tag: question-answering
library_name: transformers
tags:
  - extractive-qa
  - span-prediction

DistilBERT Fine-Tuned on SQuAD for Extractive QA

Model Description

DistilBERT base uncased fine-tuned on a 5,000-sample subset of SQuAD for extractive question answering.

Training Details

  • Base model: distilbert-base-uncased
  • Dataset: SQuAD (5,000 samples, 80/20 train/test split)
  • Epochs: 3
  • Learning rate: 2e-5
  • Batch size: 16
  • Training loss: 2.303
  • Device: Apple Silicon (MPS)

Usage

from transformers import AutoTokenizer, AutoModelForQuestionAnswering
import torch

tokenizer = AutoTokenizer.from_pretrained("John-Machado/distilbert-squad-qa")
model = AutoModelForQuestionAnswering.from_pretrained("John-Machado/distilbert-squad-qa")

question = "How many official league titles has Juventus won?"
context = "The club has won 36 official league titles, 14 Coppa Italia titles and nine Supercoppa Italiana titles."

inputs = tokenizer(question, context, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)

start = outputs.start_logits.argmax()
end = outputs.end_logits.argmax()
answer = tokenizer.decode(inputs.input_ids[0, start : end + 1])
print(answer)  # "36"

Limitations

  • Trained on only 5,000 SQuAD samples (out of 87,599) as a learning exercise
  • Answers must exist as a verbatim substring in the context
  • Cannot synthesize answers across multiple passages