--- library_name: transformers license: mit base_model: microsoft/deberta-v3-base tags: - generated_from_trainer metrics: - accuracy - precision - recall - f1 model-index: - name: judge_answer___29_deberta_v3_base_msmarco_answerability results: [] datasets: - tom-010/msmarcov2.1-binary-answerability language: - en pipeline_tag: text-classification --- # judge_answer___29_deberta_v3_base_msmarco_answerability This model is a fine-tuned version of [microsoft/deberta-v3-base](https://huggingface.co/microsoft/deberta-v3-base) on [tom-010/msmarcov2.1-binary-answerability](https://huggingface.co/datasets/tom-010/msmarcov2.1-binary-answerability). The dataset is heavily biased (only 6% positives). The notebook used to train the model solved this, by sampling the negative samples, so that the ratio is 1-to-1. It achieves the following results on the evaluation set: - Loss: 0.4194 - Accuracy: 0.8164 - Precision: 0.7814 - Recall: 0.8815 - F1: 0.8284 See the run here: https://wandb.ai/stadeltom-com/huggingface/runs/l5mt601p?nw=nwuserstadeltom ## Model description The model is a fine-tunded DeBERTa v3 and classifies if a question/query is answered by a text (passage). ## Intended uses & limitations The task is to judge if a text answers a question. The [dataset](https://huggingface.co/datasets/tom-010/msmarcov2.1-binary-answerability) uses [msmarco v2](https://github.com/zhouyonglong/MSMARCOV2), which has a query and 10 search results of the bing search engine. An annotator answered the question and marked the passages (search results) used for the answer. The dataset goes through each passage of each query and adds to the dataset the query, the passage and if wether the passage was used to answer. The downside: False negatives are totally possible. The upside: A realistic case, as we also get 10 search results and need to filter them. But: It is unknown what the baseline is. ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 3e-05 - train_batch_size: 16 - eval_batch_size: 8 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - num_epochs: 1 - mixed_precision_training: Native AMP ### Training results | Training Loss | Epoch | Step | Validation Loss | Accuracy | Precision | Recall | F1 | |:-------------:|:------:|:-----:|:---------------:|:--------:|:---------:|:------:|:------:| | 0.5008 | 0.0272 | 2000 | 0.4931 | 0.7864 | 0.7498 | 0.8632 | 0.8025 | | 0.4832 | 0.0544 | 4000 | 0.4565 | 0.7858 | 0.7422 | 0.8795 | 0.8050 | | 0.4716 | 0.0816 | 6000 | 0.4758 | 0.7926 | 0.7527 | 0.8751 | 0.8093 | | 0.4645 | 0.1088 | 8000 | 0.4740 | 0.7878 | 0.7633 | 0.8377 | 0.7988 | | 0.4697 | 0.1360 | 10000 | 0.4519 | 0.7982 | 0.7720 | 0.8496 | 0.8089 | | 0.4729 | 0.1632 | 12000 | 0.4471 | 0.7946 | 0.7664 | 0.8508 | 0.8064 | | 0.4589 | 0.1904 | 14000 | 0.4455 | 0.8002 | 0.7661 | 0.8675 | 0.8137 | | 0.4513 | 0.2176 | 16000 | 0.4726 | 0.7934 | 0.7472 | 0.8902 | 0.8125 | | 0.4573 | 0.2448 | 18000 | 0.4357 | 0.8016 | 0.7775 | 0.8481 | 0.8113 | | 0.4474 | 0.2720 | 20000 | 0.4738 | 0.7932 | 0.7503 | 0.8823 | 0.8110 | | 0.448 | 0.2992 | 22000 | 0.4360 | 0.7934 | 0.7940 | 0.7955 | 0.7948 | | 0.449 | 0.3264 | 24000 | 0.4464 | 0.7996 | 0.7708 | 0.8560 | 0.8112 | | 0.449 | 0.3536 | 26000 | 0.4467 | 0.8048 | 0.7655 | 0.8819 | 0.8196 | | 0.4483 | 0.3808 | 28000 | 0.4459 | 0.8042 | 0.7603 | 0.8918 | 0.8208 | | 0.4468 | 0.4080 | 30000 | 0.4400 | 0.8054 | 0.7898 | 0.8353 | 0.8119 | | 0.4413 | 0.4352 | 32000 | 0.4321 | 0.8048 | 0.7917 | 0.8302 | 0.8105 | | 0.4444 | 0.4624 | 34000 | 0.4309 | 0.8086 | 0.7691 | 0.8850 | 0.8230 | | 0.4507 | 0.4896 | 36000 | 0.4301 | 0.8124 | 0.7945 | 0.8457 | 0.8193 | | 0.4426 | 0.5168 | 38000 | 0.4243 | 0.8052 | 0.7698 | 0.8739 | 0.8186 | | 0.4321 | 0.5440 | 40000 | 0.4243 | 0.8074 | 0.7681 | 0.8839 | 0.8219 | | 0.4301 | 0.5712 | 42000 | 0.4380 | 0.806 | 0.7640 | 0.8886 | 0.8216 | | 0.4418 | 0.5984 | 44000 | 0.4280 | 0.8096 | 0.7857 | 0.8544 | 0.8186 | | 0.4334 | 0.6256 | 46000 | 0.4326 | 0.809 | 0.7765 | 0.8707 | 0.8209 | | 0.4385 | 0.6528 | 48000 | 0.4273 | 0.8116 | 0.7844 | 0.8624 | 0.8215 | | 0.4337 | 0.6800 | 50000 | 0.4306 | 0.8086 | 0.7795 | 0.8636 | 0.8194 | | 0.4294 | 0.7072 | 52000 | 0.4397 | 0.811 | 0.7706 | 0.8886 | 0.8254 | | 0.4276 | 0.7344 | 54000 | 0.4344 | 0.8138 | 0.7770 | 0.8831 | 0.8267 | | 0.4183 | 0.7616 | 56000 | 0.4291 | 0.812 | 0.7650 | 0.9037 | 0.8286 | | 0.4226 | 0.7888 | 58000 | 0.4342 | 0.8134 | 0.7767 | 0.8827 | 0.8263 | | 0.4266 | 0.8160 | 60000 | 0.4234 | 0.8132 | 0.7840 | 0.8675 | 0.8236 | | 0.4285 | 0.8432 | 62000 | 0.4167 | 0.8156 | 0.7882 | 0.8660 | 0.8252 | | 0.4265 | 0.8704 | 64000 | 0.4206 | 0.8142 | 0.7734 | 0.8918 | 0.8284 | | 0.429 | 0.8976 | 66000 | 0.4165 | 0.8174 | 0.7910 | 0.8656 | 0.8266 | | 0.4308 | 0.9248 | 68000 | 0.4192 | 0.814 | 0.7775 | 0.8827 | 0.8268 | | 0.4248 | 0.9520 | 70000 | 0.4205 | 0.8152 | 0.7807 | 0.8795 | 0.8272 | | 0.425 | 0.9792 | 72000 | 0.4194 | 0.8164 | 0.7814 | 0.8815 | 0.8284 | ### Framework versions - Transformers 4.45.2 - Pytorch 2.4.1+cu124 - Datasets 3.0.1 - Tokenizers 0.20.1