a-question_answerer / README.md
nppiech's picture
Update README.md
8140503 verified
metadata
library_name: transformers
license: apache-2.0
base_model: distilbert/distilbert-base-cased
tags:
  - question-answering
  - squadv2
  - distilbert
  - en
  - transformer
  - pytorch
model-index:
  - name: a-question_answerer
    results: []
language:
  - en

a-question_answerer

This model is a fine-tuned version of the DistilBERT Base Cased model on the SQuAD v2 dataset for the Question Answering task.

It was trained as part of a Google Colab project aimed at adapting a pre-trained language model to answer questions based on a given text context.

This model is a fine-tuned version of distilbert/distilbert-base-cased on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.3622*

Model description

This model is intended for use in Question Answering tasks, where the goal is to extract a concise answer span from a provided text context given a natural language question. It can handle both answerable and unanswerable questions as per the SQuAD v2 dataset format.

Intended uses & limitations

Potential use cases include:

Building a simple document Q-A system. Enhancing search functionalities to provide direct answers.

As with any model trained on a specific dataset, this model's performance is influenced by the characteristics and potential biases present in the SQuAD v2 dataset. It may perform differently on text from domains significantly different from Wikipedia articles (the source of SQuAD). The model may also inherit biases from the original DistilBERT Base Cased model.

The model's performance on identifying and answering questions depends heavily on the quality and relevance of the provided context.

Training and evaluation data

The model was fine-tuned on the SQuAD v2 dataset, which contains over 130,000 question-answer pairs derived from Wikipedia articles. The dataset includes questions that are unanswerable, requiring the model to determine if no answer exists within the provided text.

For the final reported results, the model was trained on the full SQuAD v2 training dataset.

Training procedure

The model was fine-tuned using the Hugging Face transformers library and Trainer API. The training process involved tokenizing the dataset, preparing input features with start and end positions for answers, and using DataCollatorWithPadding. Early stopping was used to load the model checkpoint with the lowest validation loss.

Training Arguments:

Learning Rate: 2e-5 Per Device Train Batch Size: 4 Per Device Eval Batch Size: 4 Number of Epochs: 3 Weight Decay: 0.1 Evaluation Strategy: epoch Save Strategy: epoch Early Stopping: Enabled (load_best_model_at_end=True, metric_for_best_model="eval_loss")

Training hyperparameters

Base Model: distilbert/distilbert-base-cased Dataset: SQuAD v2 Early Stopping: Enabled (load_best_model_at_end=True, metric_for_best_model="eval_loss")

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • weight_decay: 0.1
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • eval_strategy: epoch
  • save_strategy: epoch
  • load_best_model_at_end=True
  • metric_for_best_model="eval_loss"
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss
1.1921 1.0 32580 1.4150
0.9637 2.0 65160 1.3622*
0.6474 3.0 97740 1.8661

Evaluation Results

The model was evaluated on the SQuAD v2 validation set. The following metrics were obtained:

Metric Overall Answerable (HasAns) Unanswerable (NoAns)
Exact match-EM 64.2 60.27 67.97
F1 Score 66.57 65.10 67.97
Total Examples 2000 979 1021

Note: The metrics for 'Answerable' and 'Unanswerable' questions provide a more detailed view of the model's performance on each type of question in SQuAD v2.

Framework versions

  • Transformers 4.55.0
  • Pytorch 2.6.0+cu124
  • Datasets 4.0.0
  • Tokenizers 0.21.4