bert-mini-squadv2 / README.md

update accuracy: normalized answer comparison result

5a67324 verified about 2 months ago

5.01 kB

	---
	library_name: transformers
	license: mit
	base_model: microsoft/MiniLM-L12-H384-uncased
	tags:
	- generated_from_trainer
	- extractive_QA
	model-index:
	- name: bert-mini-squadv2
	results: []
	datasets:
	- hf-tuner/squad_v2.0.1
	language:
	- en
	metrics:
	- exact_match
	pipeline_tag: question-answering
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# bert-mini-squadv2

	This model is a fine-tuned version of [microsoft/MiniLM-L12-H384-uncased](https://huggingface.co/microsoft/MiniLM-L12-H384-uncased) on
	[hf-tuner/squad_v2.0.1](https://huggingface.co/datasets/hf-tuner/squad_v2.0.1) dataset.

	It achieves the following results on the evaluation set:
	- Loss: 1.4653
	- Exact Match Accuracy: 62.95%

	## Evaluation Notes

	#### Issues with Exact Match Evaluation
	Several correct predictions were incorrectly marked as false negatives due to strict exact-match criteria being sensitive to minor differences in tokenization, formatting, or span boundaries:

	- Predicted: `schrodinger equation` → Rejected (expected: `schrödinger equation`)
	- Predicted: `feynman diagrams` → Rejected (expected: `feynman`)
	- Predicted: `electromagnetic force` → Rejected (expected: `electromagnetic`)
	- Predicted: `45 000 pounds` → Rejected (expected: `45000 pounds`)

	#### Overall Performance
	- Exact-match accuracy: >63%
	- The model frequently generates high-quality and semantically correct answer spans even when exact-match evaluation penalizes them.
	- Primary limitation: performance drops on questions requiring deep domain-specific knowledge, largely attributable to the model's relatively small size and limited parameter capacity.

	#### Recommendations for Best Results
	- Use clear, straightforward phrasing in queries to maximize extraction accuracy.

	## Model description

	MiniLMv1-L12-H384-uncased: 12-layer, 384-hidden, 12-heads, 33M parameters, 2.7x faster than BERT-Base

	#### Direct Use
	- Extractive Question Answering: Given a passage and a question, the model extracts the most likely span of text that answers the question.
	- Handles unanswerable questions by predicting "no answer" when appropriate.

	#### Downstream Use
	Can be integrated into chatbots, virtual assistants, or search systems that require question answering over text.

	#### Out-of-Scope Use
	- Generative question answering (the model cannot generate new answers).
	- Non-English tasks (the model was trained only on English data).
	- Open-Domain QA across large corpora — works best when the context passage is provided.

	## How to use

	```python
	import torch
	from transformers import BertForQuestionAnswering, AutoTokenizer

	model_id='hf-tuner/bert-mini-squadv2'
	device = 'cuda' if torch.cuda.is_available() else 'cpu'
	tokenizer = AutoTokenizer.from_pretrained(model_id)
	bert_qa = BertForQuestionAnswering.from_pretrained(model_id).to(device)
	bert_qa = bert_qa.half()

	def get_answers(ctxq):
	inputs = tokenizer(ctxq, padding=True, return_tensors='pt')
	for k,v in inputs.items():
	inputs[k] = v.to(device)

	with torch.no_grad():
	outputs = bert_qa(**inputs)

	start_idxs = outputs.start_logits.argmax(dim=-1)
	end_idxs = outputs.end_logits.argmax(dim=-1)

	predictions = []
	for i, (start_idx, end_idx) in enumerate(zip(start_idxs, end_idxs)):
	if start_idx == end_idx:
	predictions.append("<no_answer>")
	else:
	predict_answer_tokens = inputs['input_ids'][i, start_idx : end_idx]
	pred_answer = tokenizer.decode(predict_answer_tokens)
	predictions.append(pred_answer)
	return predictions


	context = """In Q3 2024, xAI raised $6 billion in a Series C round led by Valor Equity Partners and Andreessen Horowitz, with participation from Sequoia Capital, Fidelity, and Saudi Arabia’s Kingdom Holding Company, bringing its post-money valuation to $50 billion.
	"""
	question_1 = "Which two investors co-led xAI’s $6 billion Series C round announced in Q3 2024?"
	question_2 = "On what exact date in Q3 2024 was xAI’s $6 billion Series C funding round officially closed?"

	get_answers([
	[context, question_1],
	[context, question_2],
	])

	>>> ['valor equity partners and andreessen horowitz', '<no_answer>']

	```

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-05
	- train_batch_size: 16
	- eval_batch_size: 16
	- seed: 42
	- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: linear
	- num_epochs: 2
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:-----:\|:---------------:\|
	\| 1.3678 \| 1.0 \| 8134 \| 1.4974 \|
	\| 1.1809 \| 2.0 \| 16268 \| 1.4653 \|


	### Framework versions

	- Transformers 4.57.1
	- Pytorch 2.8.0+cu126
	- Datasets 4.0.0
	- Tokenizers 0.22.1