Update README.md
Browse files
README.md
CHANGED
|
@@ -24,13 +24,44 @@ should probably proofread and complete it, then remove this comment. -->
|
|
| 24 |
|
| 25 |
This model is a fine-tuned version of [microsoft/MiniLM-L12-H384-uncased](https://huggingface.co/microsoft/MiniLM-L12-H384-uncased) on
|
| 26 |
[hf-tuner/squad_v2.0.1](https://huggingface.co/datasets/hf-tuner/squad_v2.0.1) dataset.
|
|
|
|
| 27 |
It achieves the following results on the evaluation set:
|
| 28 |
- Loss: 1.4653
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
|
| 30 |
## Model description
|
| 31 |
|
| 32 |
MiniLMv1-L12-H384-uncased: 12-layer, 384-hidden, 12-heads, 33M parameters, 2.7x faster than BERT-Base
|
| 33 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 34 |
## How to use
|
| 35 |
|
| 36 |
```python
|
|
|
|
| 24 |
|
| 25 |
This model is a fine-tuned version of [microsoft/MiniLM-L12-H384-uncased](https://huggingface.co/microsoft/MiniLM-L12-H384-uncased) on
|
| 26 |
[hf-tuner/squad_v2.0.1](https://huggingface.co/datasets/hf-tuner/squad_v2.0.1) dataset.
|
| 27 |
+
|
| 28 |
It achieves the following results on the evaluation set:
|
| 29 |
- Loss: 1.4653
|
| 30 |
+
- Exact Match Accuracy: 60.94%
|
| 31 |
+
|
| 32 |
+
## Evaluation Notes
|
| 33 |
+
|
| 34 |
+
#### Issues with Exact Match Evaluation
|
| 35 |
+
Several correct predictions were incorrectly marked as false negatives due to strict exact-match criteria being sensitive to minor differences in tokenization, formatting, or span boundaries:
|
| 36 |
+
|
| 37 |
+
- Predicted: `isaac bashevis` → Rejected (expected: `isaac bashevis singer`)
|
| 38 |
+
- Predicted: `newtonian equations` → Rejected (expected: `newtonian`)
|
| 39 |
+
- Predicted: `80,000` → Rejected (expected: `80, 000`)
|
| 40 |
+
|
| 41 |
+
#### Overall Performance
|
| 42 |
+
- Exact-match accuracy: **>60%**
|
| 43 |
+
- The model frequently generates high-quality and semantically correct answer spans even when exact-match evaluation penalizes them.
|
| 44 |
+
- Primary limitation: performance drops on questions requiring deep domain-specific knowledge, largely attributable to the model's relatively small size and limited parameter capacity.
|
| 45 |
+
|
| 46 |
+
#### Recommendations for Best Results
|
| 47 |
+
- Use clear, straightforward phrasing in queries to maximize extraction accuracy.
|
| 48 |
|
| 49 |
## Model description
|
| 50 |
|
| 51 |
MiniLMv1-L12-H384-uncased: 12-layer, 384-hidden, 12-heads, 33M parameters, 2.7x faster than BERT-Base
|
| 52 |
|
| 53 |
+
#### Direct Use
|
| 54 |
+
- Extractive Question Answering: Given a passage and a question, the model extracts the most likely span of text that answers the question.
|
| 55 |
+
- Handles unanswerable questions by predicting "no answer" when appropriate.
|
| 56 |
+
|
| 57 |
+
#### Downstream Use
|
| 58 |
+
Can be integrated into chatbots, virtual assistants, or search systems that require question answering over text.
|
| 59 |
+
|
| 60 |
+
#### Out-of-Scope Use
|
| 61 |
+
- Generative question answering (the model cannot generate new answers).
|
| 62 |
+
- Non-English tasks (the model was trained only on English data).
|
| 63 |
+
- Open-Domain QA across large corpora — works best when the context passage is provided.
|
| 64 |
+
|
| 65 |
## How to use
|
| 66 |
|
| 67 |
```python
|