File size: 6,189 Bytes
f28df5e
 
 
 
 
 
 
 
 
 
 
 
 
 
948b127
 
 
 
 
f28df5e
 
 
 
 
 
 
948b127
 
 
f28df5e
 
 
 
 
 
 
948b127
 
f28df5e
 
948b127
f28df5e
 
 
948b127
 
 
 
 
 
f28df5e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
948b127
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
---
library_name: transformers
license: mit
base_model: microsoft/deberta-v3-base
tags:
- generated_from_trainer
metrics:
- accuracy
- precision
- recall
- f1
model-index:
- name: judge_answer___29_deberta_v3_base_msmarco_answerability
  results: []
datasets:
- tom-010/msmarcov2.1-binary-answerability
language:
- en
pipeline_tag: text-classification
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# judge_answer___29_deberta_v3_base_msmarco_answerability

This model is a fine-tuned version of [microsoft/deberta-v3-base](https://huggingface.co/microsoft/deberta-v3-base) on [tom-010/msmarcov2.1-binary-answerability](https://huggingface.co/datasets/tom-010/msmarcov2.1-binary-answerability).
The dataset is heavily biased (only 6% positives). The notebook used to train the model solved this, by sampling the negative samples, so that the ratio is 1-to-1.

It achieves the following results on the evaluation set:
- Loss: 0.4194
- Accuracy: 0.8164
- Precision: 0.7814
- Recall: 0.8815
- F1: 0.8284

See the run here: https://wandb.ai/stadeltom-com/huggingface/runs/l5mt601p?nw=nwuserstadeltom

## Model description

The model is a fine-tunded DeBERTa v3 and classifies if a question/query is answered by a text (passage).

## Intended uses & limitations

The task is to judge if a text answers a question. 
The [dataset](https://huggingface.co/datasets/tom-010/msmarcov2.1-binary-answerability) uses [msmarco v2](https://github.com/zhouyonglong/MSMARCOV2), which has a query and 10 search results of the bing search engine. 
An annotator answered the question and marked the passages (search results) used for the answer.
The dataset goes through each passage of each query and adds to the dataset the query, the passage and if wether the passage was used to answer. 
The downside: False negatives are totally possible. The upside: A realistic case, as we also get 10 search results and need to filter them. 
But: It is unknown what the baseline is. 
## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 3e-05
- train_batch_size: 16
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 1
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch  | Step  | Validation Loss | Accuracy | Precision | Recall | F1     |
|:-------------:|:------:|:-----:|:---------------:|:--------:|:---------:|:------:|:------:|
| 0.5008        | 0.0272 | 2000  | 0.4931          | 0.7864   | 0.7498    | 0.8632 | 0.8025 |
| 0.4832        | 0.0544 | 4000  | 0.4565          | 0.7858   | 0.7422    | 0.8795 | 0.8050 |
| 0.4716        | 0.0816 | 6000  | 0.4758          | 0.7926   | 0.7527    | 0.8751 | 0.8093 |
| 0.4645        | 0.1088 | 8000  | 0.4740          | 0.7878   | 0.7633    | 0.8377 | 0.7988 |
| 0.4697        | 0.1360 | 10000 | 0.4519          | 0.7982   | 0.7720    | 0.8496 | 0.8089 |
| 0.4729        | 0.1632 | 12000 | 0.4471          | 0.7946   | 0.7664    | 0.8508 | 0.8064 |
| 0.4589        | 0.1904 | 14000 | 0.4455          | 0.8002   | 0.7661    | 0.8675 | 0.8137 |
| 0.4513        | 0.2176 | 16000 | 0.4726          | 0.7934   | 0.7472    | 0.8902 | 0.8125 |
| 0.4573        | 0.2448 | 18000 | 0.4357          | 0.8016   | 0.7775    | 0.8481 | 0.8113 |
| 0.4474        | 0.2720 | 20000 | 0.4738          | 0.7932   | 0.7503    | 0.8823 | 0.8110 |
| 0.448         | 0.2992 | 22000 | 0.4360          | 0.7934   | 0.7940    | 0.7955 | 0.7948 |
| 0.449         | 0.3264 | 24000 | 0.4464          | 0.7996   | 0.7708    | 0.8560 | 0.8112 |
| 0.449         | 0.3536 | 26000 | 0.4467          | 0.8048   | 0.7655    | 0.8819 | 0.8196 |
| 0.4483        | 0.3808 | 28000 | 0.4459          | 0.8042   | 0.7603    | 0.8918 | 0.8208 |
| 0.4468        | 0.4080 | 30000 | 0.4400          | 0.8054   | 0.7898    | 0.8353 | 0.8119 |
| 0.4413        | 0.4352 | 32000 | 0.4321          | 0.8048   | 0.7917    | 0.8302 | 0.8105 |
| 0.4444        | 0.4624 | 34000 | 0.4309          | 0.8086   | 0.7691    | 0.8850 | 0.8230 |
| 0.4507        | 0.4896 | 36000 | 0.4301          | 0.8124   | 0.7945    | 0.8457 | 0.8193 |
| 0.4426        | 0.5168 | 38000 | 0.4243          | 0.8052   | 0.7698    | 0.8739 | 0.8186 |
| 0.4321        | 0.5440 | 40000 | 0.4243          | 0.8074   | 0.7681    | 0.8839 | 0.8219 |
| 0.4301        | 0.5712 | 42000 | 0.4380          | 0.806    | 0.7640    | 0.8886 | 0.8216 |
| 0.4418        | 0.5984 | 44000 | 0.4280          | 0.8096   | 0.7857    | 0.8544 | 0.8186 |
| 0.4334        | 0.6256 | 46000 | 0.4326          | 0.809    | 0.7765    | 0.8707 | 0.8209 |
| 0.4385        | 0.6528 | 48000 | 0.4273          | 0.8116   | 0.7844    | 0.8624 | 0.8215 |
| 0.4337        | 0.6800 | 50000 | 0.4306          | 0.8086   | 0.7795    | 0.8636 | 0.8194 |
| 0.4294        | 0.7072 | 52000 | 0.4397          | 0.811    | 0.7706    | 0.8886 | 0.8254 |
| 0.4276        | 0.7344 | 54000 | 0.4344          | 0.8138   | 0.7770    | 0.8831 | 0.8267 |
| 0.4183        | 0.7616 | 56000 | 0.4291          | 0.812    | 0.7650    | 0.9037 | 0.8286 |
| 0.4226        | 0.7888 | 58000 | 0.4342          | 0.8134   | 0.7767    | 0.8827 | 0.8263 |
| 0.4266        | 0.8160 | 60000 | 0.4234          | 0.8132   | 0.7840    | 0.8675 | 0.8236 |
| 0.4285        | 0.8432 | 62000 | 0.4167          | 0.8156   | 0.7882    | 0.8660 | 0.8252 |
| 0.4265        | 0.8704 | 64000 | 0.4206          | 0.8142   | 0.7734    | 0.8918 | 0.8284 |
| 0.429         | 0.8976 | 66000 | 0.4165          | 0.8174   | 0.7910    | 0.8656 | 0.8266 |
| 0.4308        | 0.9248 | 68000 | 0.4192          | 0.814    | 0.7775    | 0.8827 | 0.8268 |
| 0.4248        | 0.9520 | 70000 | 0.4205          | 0.8152   | 0.7807    | 0.8795 | 0.8272 |
| 0.425         | 0.9792 | 72000 | 0.4194          | 0.8164   | 0.7814    | 0.8815 | 0.8284 |


### Framework versions

- Transformers 4.45.2
- Pytorch 2.4.1+cu124
- Datasets 3.0.1
- Tokenizers 0.20.1