--- library_name: transformers license: mit base_model: microsoft/MiniLM-L12-H384-uncased tags: - generated_from_trainer - extractive_QA model-index: - name: bert-mini-squadv2 results: [] datasets: - hf-tuner/squad_v2.0.1 language: - en metrics: - exact_match pipeline_tag: question-answering --- # bert-mini-squadv2 This model is a fine-tuned version of [microsoft/MiniLM-L12-H384-uncased](https://huggingface.co/microsoft/MiniLM-L12-H384-uncased) on [hf-tuner/squad_v2.0.1](https://huggingface.co/datasets/hf-tuner/squad_v2.0.1) dataset. It achieves the following results on the evaluation set: - Loss: 1.4653 - Exact Match Accuracy: 62.95% ## Evaluation Notes #### Issues with Exact Match Evaluation Several correct predictions were incorrectly marked as false negatives due to strict exact-match criteria being sensitive to minor differences in tokenization, formatting, or span boundaries: - Predicted: `schrodinger equation` → Rejected (expected: `schrödinger equation`) - Predicted: `feynman diagrams` → Rejected (expected: `feynman`) - Predicted: `electromagnetic force` → Rejected (expected: `electromagnetic`) - Predicted: `45 000 pounds` → Rejected (expected: `45000 pounds`) #### Overall Performance - Exact-match accuracy: **>63%** - The model frequently generates high-quality and semantically correct answer spans even when exact-match evaluation penalizes them. - Primary limitation: performance drops on questions requiring deep domain-specific knowledge, largely attributable to the model's relatively small size and limited parameter capacity. #### Recommendations for Best Results - Use clear, straightforward phrasing in queries to maximize extraction accuracy. ## Model description MiniLMv1-L12-H384-uncased: 12-layer, 384-hidden, 12-heads, 33M parameters, 2.7x faster than BERT-Base #### Direct Use - Extractive Question Answering: Given a passage and a question, the model extracts the most likely span of text that answers the question. - Handles unanswerable questions by predicting "no answer" when appropriate. #### Downstream Use Can be integrated into chatbots, virtual assistants, or search systems that require question answering over text. #### Out-of-Scope Use - Generative question answering (the model cannot generate new answers). - Non-English tasks (the model was trained only on English data). - Open-Domain QA across large corpora — works best when the context passage is provided. ## How to use ```python import torch from transformers import BertForQuestionAnswering, AutoTokenizer model_id='hf-tuner/bert-mini-squadv2' device = 'cuda' if torch.cuda.is_available() else 'cpu' tokenizer = AutoTokenizer.from_pretrained(model_id) bert_qa = BertForQuestionAnswering.from_pretrained(model_id).to(device) bert_qa = bert_qa.half() def get_answers(ctxq): inputs = tokenizer(ctxq, padding=True, return_tensors='pt') for k,v in inputs.items(): inputs[k] = v.to(device) with torch.no_grad(): outputs = bert_qa(**inputs) start_idxs = outputs.start_logits.argmax(dim=-1) end_idxs = outputs.end_logits.argmax(dim=-1) predictions = [] for i, (start_idx, end_idx) in enumerate(zip(start_idxs, end_idxs)): if start_idx == end_idx: predictions.append("") else: predict_answer_tokens = inputs['input_ids'][i, start_idx : end_idx] pred_answer = tokenizer.decode(predict_answer_tokens) predictions.append(pred_answer) return predictions context = """In Q3 2024, xAI raised $6 billion in a Series C round led by Valor Equity Partners and Andreessen Horowitz, with participation from Sequoia Capital, Fidelity, and Saudi Arabia’s Kingdom Holding Company, bringing its post-money valuation to $50 billion. """ question_1 = "Which two investors co-led xAI’s $6 billion Series C round announced in Q3 2024?" question_2 = "On what exact date in Q3 2024 was xAI’s $6 billion Series C funding round officially closed?" get_answers([ [context, question_1], [context, question_2], ]) >>> ['valor equity partners and andreessen horowitz', ''] ``` ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 16 - eval_batch_size: 16 - seed: 42 - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: linear - num_epochs: 2 - mixed_precision_training: Native AMP ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:-----:|:-----:|:---------------:| | 1.3678 | 1.0 | 8134 | 1.4974 | | 1.1809 | 2.0 | 16268 | 1.4653 | ### Framework versions - Transformers 4.57.1 - Pytorch 2.8.0+cu126 - Datasets 4.0.0 - Tokenizers 0.22.1