library_name: transformers
license: apache-2.0
base_model: distilbert/distilbert-base-cased
tags:
- question-answering
- squadv2
- distilbert
- en
- transformer
- pytorch
model-index:
- name: a-question_answerer
results: []
language:
- en
a-question_answerer
This model is a fine-tuned version of the DistilBERT Base Cased model on the SQuAD v2 dataset for the Question Answering task.
It was trained as part of a Google Colab project aimed at adapting a pre-trained language model to answer questions based on a given text context.
This model is a fine-tuned version of distilbert/distilbert-base-cased on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.3622*
Model description
This model is intended for use in Question Answering tasks, where the goal is to extract a concise answer span from a provided text context given a natural language question. It can handle both answerable and unanswerable questions as per the SQuAD v2 dataset format.
Intended uses & limitations
Potential use cases include:
Building a simple document Q-A system. Enhancing search functionalities to provide direct answers.
As with any model trained on a specific dataset, this model's performance is influenced by the characteristics and potential biases present in the SQuAD v2 dataset. It may perform differently on text from domains significantly different from Wikipedia articles (the source of SQuAD). The model may also inherit biases from the original DistilBERT Base Cased model.
The model's performance on identifying and answering questions depends heavily on the quality and relevance of the provided context.
Training and evaluation data
The model was fine-tuned on the SQuAD v2 dataset, which contains over 130,000 question-answer pairs derived from Wikipedia articles. The dataset includes questions that are unanswerable, requiring the model to determine if no answer exists within the provided text.
For the final reported results, the model was trained on the full SQuAD v2 training dataset.
Training procedure
The model was fine-tuned using the Hugging Face transformers library and Trainer API. The training process involved tokenizing the dataset, preparing input features with start and end positions for answers, and using DataCollatorWithPadding. Early stopping was used to load the model checkpoint with the lowest validation loss.
Training Arguments:
Learning Rate: 2e-5 Per Device Train Batch Size: 4 Per Device Eval Batch Size: 4 Number of Epochs: 3 Weight Decay: 0.1 Evaluation Strategy: epoch Save Strategy: epoch Early Stopping: Enabled (load_best_model_at_end=True, metric_for_best_model="eval_loss")
Training hyperparameters
Base Model: distilbert/distilbert-base-cased Dataset: SQuAD v2 Early Stopping: Enabled (load_best_model_at_end=True, metric_for_best_model="eval_loss")
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 4
- eval_batch_size: 4
- weight_decay: 0.1
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- eval_strategy: epoch
- save_strategy: epoch
- load_best_model_at_end=True
- metric_for_best_model="eval_loss"
- num_epochs: 3
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 1.1921 | 1.0 | 32580 | 1.4150 |
| 0.9637 | 2.0 | 65160 | 1.3622* |
| 0.6474 | 3.0 | 97740 | 1.8661 |
Evaluation Results
The model was evaluated on the SQuAD v2 validation set. The following metrics were obtained:
| Metric | Overall | Answerable (HasAns) | Unanswerable (NoAns) |
|---|---|---|---|
| Exact match-EM | 64.2 | 60.27 | 67.97 |
| F1 Score | 66.57 | 65.10 | 67.97 |
| Total Examples | 2000 | 979 | 1021 |
Note: The metrics for 'Answerable' and 'Unanswerable' questions provide a more detailed view of the model's performance on each type of question in SQuAD v2.
Framework versions
- Transformers 4.55.0
- Pytorch 2.6.0+cu124
- Datasets 4.0.0
- Tokenizers 0.21.4