deberta-v3-large for Extractive QA

This is the deberta-v3-large model, fine-tuned using the SQuAD2.0 dataset. It's been trained on question-answer pairs, including unanswerable questions, for the task of Extractive Question Answering.

This model was trained using LoRA available through the PEFT library.

Overview

Language model: deberta-v3-large
Language: English
Downstream-task: Extractive QA
Training data: SQuAD 2.0
Eval data: SQuAD 2.0
Infrastructure: 1x NVIDIA 3070

Model Usage

Using Transformers

This uses the merged weights (base model weights + LoRA weights) to allow for simple use in Transformers pipelines. It has the same performance as using the weights separately when using the PEFT library.

import torch
from transformers import(
  AutoModelForQuestionAnswering,
  AutoTokenizer,
  pipeline
)
model_name = "sjrhuschlee/deberta-v3-large-squad2"

# a) Using pipelines
nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)
qa_input = {
'question': 'Where do I live?',
'context': 'My name is Sarah and I live in London'
}
res = nlp(qa_input)
# {'score': 0.984, 'start': 30, 'end': 37, 'answer': ' London'}

# b) Load model & tokenizer
model = AutoModelForQuestionAnswering.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

question = 'Where do I live?'
context = 'My name is Sarah and I live in London'
encoding = tokenizer(question, context, return_tensors="pt")
start_scores, end_scores = model(
  encoding["input_ids"],
  attention_mask=encoding["attention_mask"],
  return_dict=False
)

all_tokens = tokenizer.convert_ids_to_tokens(input_ids[0].tolist())
answer_tokens = all_tokens[torch.argmax(start_scores):torch.argmax(end_scores) + 1]
answer = tokenizer.decode(tokenizer.convert_tokens_to_ids(answer_tokens))
# 'London'

Metrics

# Squad v2
{
    "eval_HasAns_exact": 84.83468286099865,
    "eval_HasAns_f1": 90.48374860633226,
    "eval_HasAns_total": 5928,
    "eval_NoAns_exact": 91.0681244743482,
    "eval_NoAns_f1": 91.0681244743482,
    "eval_NoAns_total": 5945,
    "eval_best_exact": 87.95586625115808,
    "eval_best_exact_thresh": 0.0,
    "eval_best_f1": 90.77635490089573,
    "eval_best_f1_thresh": 0.0,
    "eval_exact": 87.95586625115808,
    "eval_f1": 90.77635490089592,
    "eval_runtime": 623.1333,
    "eval_samples": 11951,
    "eval_samples_per_second": 19.179,
    "eval_steps_per_second": 0.799,
    "eval_total": 11873
}

# Squad
{
    "eval_exact_match": 89.29044465468307,
    "eval_f1": 94.9846365606959,
    "eval_runtime": 553.7132,
    "eval_samples": 10618,
    "eval_samples_per_second": 19.176,
    "eval_steps_per_second": 0.8
}

Using with Peft

NOTE: This requires code in the PR https://github.com/huggingface/peft/pull/473 for the PEFT library.

#!pip install peft

from peft import LoraConfig, PeftModelForQuestionAnswering
from transformers import AutoModelForQuestionAnswering, AutoTokenizer
model_name = "sjrhuschlee/deberta-v3-large-squad2"

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 24
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 1
total_train_batch_size: 24
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 4.0

LoRA Config

{
  "base_model_name_or_path": "microsoft/deberta-v3-large",
  "bias": "none",
  "fan_in_fan_out": false,
  "inference_mode": true,
  "init_lora_weights": true,
  "lora_alpha": 32,
  "lora_dropout": 0.1,
  "modules_to_save": ["qa_outputs"],
  "peft_type": "LORA",
  "r": 8,
  "target_modules": [
    "query_proj",
    "key_proj",
    "value_proj",
    "dense"
  ],
  "task_type": "QUESTION_ANS"
}

Framework versions

Transformers 4.30.0.dev0
Pytorch 2.0.1+cu117
Datasets 2.12.0
Tokenizers 0.13.3

Downloads last month: 5

Safetensors

Model size

0.4B params

Tensor type

I64

F32

Model tree for sjrhuschlee/deberta-v3-large-squad2

Base model

microsoft/deberta-v3-large

Adapter

(10)

this model

Quantizations

1 model

Datasets used to train sjrhuschlee/deberta-v3-large-squad2

Evaluation results

Exact Match on squad_v2
validation set self-reported

87.956
F1 on squad_v2
validation set self-reported

90.781
Exact Match on squad
validation set self-reported

89.290
F1 on squad
validation set self-reported

95.008
Exact Match on adversarial_qa
validation set self-reported

41.400
F1 on adversarial_qa
validation set self-reported

55.676
Exact Match on squad_adversarial
validation set self-reported

83.660
F1 on squad_adversarial
validation set self-reported

89.451