File size: 5,009 Bytes
5c73593
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
137cda9
 
 
 
 
 
c8bb4a5
 
e51374b
137cda9
30dfe66
5a67324
e51374b
 
 
 
 
 
5a67324
 
 
 
e51374b
 
5a67324
e51374b
 
 
 
 
137cda9
 
 
c8bb4a5
137cda9
e51374b
 
 
 
 
 
 
 
 
 
 
 
5c73593
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
137cda9
 
 
 
 
 
 
 
 
 
30dfe66
137cda9
 
 
 
 
30dfe66
 
137cda9
 
 
 
 
 
 
c8bb4a5
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
---
library_name: transformers
license: mit
base_model: microsoft/MiniLM-L12-H384-uncased
tags:
- generated_from_trainer
- extractive_QA
model-index:
- name: bert-mini-squadv2
  results: []
datasets:
- hf-tuner/squad_v2.0.1
language:
- en
metrics:
- exact_match
pipeline_tag: question-answering
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# bert-mini-squadv2

This model is a fine-tuned version of [microsoft/MiniLM-L12-H384-uncased](https://huggingface.co/microsoft/MiniLM-L12-H384-uncased) on 
[hf-tuner/squad_v2.0.1](https://huggingface.co/datasets/hf-tuner/squad_v2.0.1) dataset.

It achieves the following results on the evaluation set:
- Loss: 1.4653
- Exact Match Accuracy: 62.95%

## Evaluation Notes

#### Issues with Exact Match Evaluation
Several correct predictions were incorrectly marked as false negatives due to strict exact-match criteria being sensitive to minor differences in tokenization, formatting, or span boundaries:

- Predicted: `schrodinger equation` → Rejected (expected: `schrödinger equation`)
- Predicted: `feynman diagrams` → Rejected (expected: `feynman`)
- Predicted: `electromagnetic force` → Rejected (expected: `electromagnetic`)
- Predicted: `45 000 pounds` → Rejected (expected: `45000 pounds`)

#### Overall Performance
- Exact-match accuracy: **>63%**
- The model frequently generates high-quality and semantically correct answer spans even when exact-match evaluation penalizes them.
- Primary limitation: performance drops on questions requiring deep domain-specific knowledge, largely attributable to the model's relatively small size and limited parameter capacity.

#### Recommendations for Best Results
- Use clear, straightforward phrasing in queries to maximize extraction accuracy.

## Model description

MiniLMv1-L12-H384-uncased: 12-layer, 384-hidden, 12-heads, 33M parameters, 2.7x faster than BERT-Base

#### Direct Use
- Extractive Question Answering: Given a passage and a question, the model extracts the most likely span of text that answers the question.
- Handles unanswerable questions by predicting "no answer" when appropriate.

#### Downstream Use
Can be integrated into chatbots, virtual assistants, or search systems that require question answering over text.

#### Out-of-Scope Use
- Generative question answering (the model cannot generate new answers).
- Non-English tasks (the model was trained only on English data).
- Open-Domain QA across large corpora — works best when the context passage is provided.

## How to use

```python
import torch
from transformers import BertForQuestionAnswering, AutoTokenizer

model_id='hf-tuner/bert-mini-squadv2'
device = 'cuda' if torch.cuda.is_available() else 'cpu'
tokenizer = AutoTokenizer.from_pretrained(model_id)
bert_qa = BertForQuestionAnswering.from_pretrained(model_id).to(device)
bert_qa = bert_qa.half()

def get_answers(ctxq):
  inputs = tokenizer(ctxq, padding=True, return_tensors='pt')
  for k,v in inputs.items():
    inputs[k] = v.to(device)

  with torch.no_grad():
    outputs = bert_qa(**inputs)

  start_idxs = outputs.start_logits.argmax(dim=-1)
  end_idxs = outputs.end_logits.argmax(dim=-1)

  predictions = []
  for i, (start_idx, end_idx) in enumerate(zip(start_idxs, end_idxs)):
    if start_idx == end_idx:
      predictions.append("<no_answer>")
    else:
      predict_answer_tokens = inputs['input_ids'][i, start_idx : end_idx]
      pred_answer = tokenizer.decode(predict_answer_tokens)
      predictions.append(pred_answer)
  return predictions


context = """In Q3 2024, xAI raised $6 billion in a Series C round led by Valor Equity Partners and Andreessen Horowitz, with participation from Sequoia Capital, Fidelity, and Saudi Arabia’s Kingdom Holding Company, bringing its post-money valuation to $50 billion.
"""
question_1 = "Which two investors co-led xAI’s $6 billion Series C round announced in Q3 2024?"
question_2 = "On what exact date in Q3 2024 was xAI’s $6 billion Series C funding round officially closed?"

get_answers([
    [context, question_1],
    [context, question_2],
])

>>> ['valor equity partners and andreessen horowitz', '<no_answer>']

```

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 2
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch | Step  | Validation Loss |
|:-------------:|:-----:|:-----:|:---------------:|
| 1.3678        | 1.0   | 8134  | 1.4974          |
| 1.1809        | 2.0   | 16268 | 1.4653          |


### Framework versions

- Transformers 4.57.1
- Pytorch 2.8.0+cu126
- Datasets 4.0.0
- Tokenizers 0.22.1