|
|
--- |
|
|
library_name: transformers |
|
|
license: mit |
|
|
base_model: microsoft/MiniLM-L12-H384-uncased |
|
|
tags: |
|
|
- generated_from_trainer |
|
|
- extractive_QA |
|
|
model-index: |
|
|
- name: bert-mini-squadv2 |
|
|
results: [] |
|
|
datasets: |
|
|
- hf-tuner/squad_v2.0.1 |
|
|
language: |
|
|
- en |
|
|
metrics: |
|
|
- exact_match |
|
|
pipeline_tag: question-answering |
|
|
--- |
|
|
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
|
|
# bert-mini-squadv2 |
|
|
|
|
|
This model is a fine-tuned version of [microsoft/MiniLM-L12-H384-uncased](https://huggingface.co/microsoft/MiniLM-L12-H384-uncased) on |
|
|
[hf-tuner/squad_v2.0.1](https://huggingface.co/datasets/hf-tuner/squad_v2.0.1) dataset. |
|
|
|
|
|
It achieves the following results on the evaluation set: |
|
|
- Loss: 1.4653 |
|
|
- Exact Match Accuracy: 62.95% |
|
|
|
|
|
## Evaluation Notes |
|
|
|
|
|
#### Issues with Exact Match Evaluation |
|
|
Several correct predictions were incorrectly marked as false negatives due to strict exact-match criteria being sensitive to minor differences in tokenization, formatting, or span boundaries: |
|
|
|
|
|
- Predicted: `schrodinger equation` → Rejected (expected: `schrödinger equation`) |
|
|
- Predicted: `feynman diagrams` → Rejected (expected: `feynman`) |
|
|
- Predicted: `electromagnetic force` → Rejected (expected: `electromagnetic`) |
|
|
- Predicted: `45 000 pounds` → Rejected (expected: `45000 pounds`) |
|
|
|
|
|
#### Overall Performance |
|
|
- Exact-match accuracy: **>63%** |
|
|
- The model frequently generates high-quality and semantically correct answer spans even when exact-match evaluation penalizes them. |
|
|
- Primary limitation: performance drops on questions requiring deep domain-specific knowledge, largely attributable to the model's relatively small size and limited parameter capacity. |
|
|
|
|
|
#### Recommendations for Best Results |
|
|
- Use clear, straightforward phrasing in queries to maximize extraction accuracy. |
|
|
|
|
|
## Model description |
|
|
|
|
|
MiniLMv1-L12-H384-uncased: 12-layer, 384-hidden, 12-heads, 33M parameters, 2.7x faster than BERT-Base |
|
|
|
|
|
#### Direct Use |
|
|
- Extractive Question Answering: Given a passage and a question, the model extracts the most likely span of text that answers the question. |
|
|
- Handles unanswerable questions by predicting "no answer" when appropriate. |
|
|
|
|
|
#### Downstream Use |
|
|
Can be integrated into chatbots, virtual assistants, or search systems that require question answering over text. |
|
|
|
|
|
#### Out-of-Scope Use |
|
|
- Generative question answering (the model cannot generate new answers). |
|
|
- Non-English tasks (the model was trained only on English data). |
|
|
- Open-Domain QA across large corpora — works best when the context passage is provided. |
|
|
|
|
|
## How to use |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from transformers import BertForQuestionAnswering, AutoTokenizer |
|
|
|
|
|
model_id='hf-tuner/bert-mini-squadv2' |
|
|
device = 'cuda' if torch.cuda.is_available() else 'cpu' |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
|
bert_qa = BertForQuestionAnswering.from_pretrained(model_id).to(device) |
|
|
bert_qa = bert_qa.half() |
|
|
|
|
|
def get_answers(ctxq): |
|
|
inputs = tokenizer(ctxq, padding=True, return_tensors='pt') |
|
|
for k,v in inputs.items(): |
|
|
inputs[k] = v.to(device) |
|
|
|
|
|
with torch.no_grad(): |
|
|
outputs = bert_qa(**inputs) |
|
|
|
|
|
start_idxs = outputs.start_logits.argmax(dim=-1) |
|
|
end_idxs = outputs.end_logits.argmax(dim=-1) |
|
|
|
|
|
predictions = [] |
|
|
for i, (start_idx, end_idx) in enumerate(zip(start_idxs, end_idxs)): |
|
|
if start_idx == end_idx: |
|
|
predictions.append("<no_answer>") |
|
|
else: |
|
|
predict_answer_tokens = inputs['input_ids'][i, start_idx : end_idx] |
|
|
pred_answer = tokenizer.decode(predict_answer_tokens) |
|
|
predictions.append(pred_answer) |
|
|
return predictions |
|
|
|
|
|
|
|
|
context = """In Q3 2024, xAI raised $6 billion in a Series C round led by Valor Equity Partners and Andreessen Horowitz, with participation from Sequoia Capital, Fidelity, and Saudi Arabia’s Kingdom Holding Company, bringing its post-money valuation to $50 billion. |
|
|
""" |
|
|
question_1 = "Which two investors co-led xAI’s $6 billion Series C round announced in Q3 2024?" |
|
|
question_2 = "On what exact date in Q3 2024 was xAI’s $6 billion Series C funding round officially closed?" |
|
|
|
|
|
get_answers([ |
|
|
[context, question_1], |
|
|
[context, question_2], |
|
|
]) |
|
|
|
|
|
>>> ['valor equity partners and andreessen horowitz', '<no_answer>'] |
|
|
|
|
|
``` |
|
|
|
|
|
### Training hyperparameters |
|
|
|
|
|
The following hyperparameters were used during training: |
|
|
- learning_rate: 2e-05 |
|
|
- train_batch_size: 16 |
|
|
- eval_batch_size: 16 |
|
|
- seed: 42 |
|
|
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments |
|
|
- lr_scheduler_type: linear |
|
|
- num_epochs: 2 |
|
|
- mixed_precision_training: Native AMP |
|
|
|
|
|
### Training results |
|
|
|
|
|
| Training Loss | Epoch | Step | Validation Loss | |
|
|
|:-------------:|:-----:|:-----:|:---------------:| |
|
|
| 1.3678 | 1.0 | 8134 | 1.4974 | |
|
|
| 1.1809 | 2.0 | 16268 | 1.4653 | |
|
|
|
|
|
|
|
|
### Framework versions |
|
|
|
|
|
- Transformers 4.57.1 |
|
|
- Pytorch 2.8.0+cu126 |
|
|
- Datasets 4.0.0 |
|
|
- Tokenizers 0.22.1 |