Update README.md

948b127 verified about 1 year ago

6.19 kB

	---
	library_name: transformers
	license: mit
	base_model: microsoft/deberta-v3-base
	tags:
	- generated_from_trainer
	metrics:
	- accuracy
	- precision
	- recall
	- f1
	model-index:
	- name: judge_answer___29_deberta_v3_base_msmarco_answerability
	results: []
	datasets:
	- tom-010/msmarcov2.1-binary-answerability
	language:
	- en
	pipeline_tag: text-classification
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# judge_answer___29_deberta_v3_base_msmarco_answerability

	This model is a fine-tuned version of [microsoft/deberta-v3-base](https://huggingface.co/microsoft/deberta-v3-base) on [tom-010/msmarcov2.1-binary-answerability](https://huggingface.co/datasets/tom-010/msmarcov2.1-binary-answerability).
	The dataset is heavily biased (only 6% positives). The notebook used to train the model solved this, by sampling the negative samples, so that the ratio is 1-to-1.

	It achieves the following results on the evaluation set:
	- Loss: 0.4194
	- Accuracy: 0.8164
	- Precision: 0.7814
	- Recall: 0.8815
	- F1: 0.8284

	See the run here: https://wandb.ai/stadeltom-com/huggingface/runs/l5mt601p?nw=nwuserstadeltom

	## Model description

	The model is a fine-tunded DeBERTa v3 and classifies if a question/query is answered by a text (passage).

	## Intended uses & limitations

	The task is to judge if a text answers a question.
	The [dataset](https://huggingface.co/datasets/tom-010/msmarcov2.1-binary-answerability) uses [msmarco v2](https://github.com/zhouyonglong/MSMARCOV2), which has a query and 10 search results of the bing search engine.
	An annotator answered the question and marked the passages (search results) used for the answer.
	The dataset goes through each passage of each query and adds to the dataset the query, the passage and if wether the passage was used to answer.
	The downside: False negatives are totally possible. The upside: A realistic case, as we also get 10 search results and need to filter them.
	But: It is unknown what the baseline is.
	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 3e-05
	- train_batch_size: 16
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 1
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \| Precision \| Recall \| F1 \|
	\|:-------------:\|:------:\|:-----:\|:---------------:\|:--------:\|:---------:\|:------:\|:------:\|
	\| 0.5008 \| 0.0272 \| 2000 \| 0.4931 \| 0.7864 \| 0.7498 \| 0.8632 \| 0.8025 \|
	\| 0.4832 \| 0.0544 \| 4000 \| 0.4565 \| 0.7858 \| 0.7422 \| 0.8795 \| 0.8050 \|
	\| 0.4716 \| 0.0816 \| 6000 \| 0.4758 \| 0.7926 \| 0.7527 \| 0.8751 \| 0.8093 \|
	\| 0.4645 \| 0.1088 \| 8000 \| 0.4740 \| 0.7878 \| 0.7633 \| 0.8377 \| 0.7988 \|
	\| 0.4697 \| 0.1360 \| 10000 \| 0.4519 \| 0.7982 \| 0.7720 \| 0.8496 \| 0.8089 \|
	\| 0.4729 \| 0.1632 \| 12000 \| 0.4471 \| 0.7946 \| 0.7664 \| 0.8508 \| 0.8064 \|
	\| 0.4589 \| 0.1904 \| 14000 \| 0.4455 \| 0.8002 \| 0.7661 \| 0.8675 \| 0.8137 \|
	\| 0.4513 \| 0.2176 \| 16000 \| 0.4726 \| 0.7934 \| 0.7472 \| 0.8902 \| 0.8125 \|
	\| 0.4573 \| 0.2448 \| 18000 \| 0.4357 \| 0.8016 \| 0.7775 \| 0.8481 \| 0.8113 \|
	\| 0.4474 \| 0.2720 \| 20000 \| 0.4738 \| 0.7932 \| 0.7503 \| 0.8823 \| 0.8110 \|
	\| 0.448 \| 0.2992 \| 22000 \| 0.4360 \| 0.7934 \| 0.7940 \| 0.7955 \| 0.7948 \|
	\| 0.449 \| 0.3264 \| 24000 \| 0.4464 \| 0.7996 \| 0.7708 \| 0.8560 \| 0.8112 \|
	\| 0.449 \| 0.3536 \| 26000 \| 0.4467 \| 0.8048 \| 0.7655 \| 0.8819 \| 0.8196 \|
	\| 0.4483 \| 0.3808 \| 28000 \| 0.4459 \| 0.8042 \| 0.7603 \| 0.8918 \| 0.8208 \|
	\| 0.4468 \| 0.4080 \| 30000 \| 0.4400 \| 0.8054 \| 0.7898 \| 0.8353 \| 0.8119 \|
	\| 0.4413 \| 0.4352 \| 32000 \| 0.4321 \| 0.8048 \| 0.7917 \| 0.8302 \| 0.8105 \|
	\| 0.4444 \| 0.4624 \| 34000 \| 0.4309 \| 0.8086 \| 0.7691 \| 0.8850 \| 0.8230 \|
	\| 0.4507 \| 0.4896 \| 36000 \| 0.4301 \| 0.8124 \| 0.7945 \| 0.8457 \| 0.8193 \|
	\| 0.4426 \| 0.5168 \| 38000 \| 0.4243 \| 0.8052 \| 0.7698 \| 0.8739 \| 0.8186 \|
	\| 0.4321 \| 0.5440 \| 40000 \| 0.4243 \| 0.8074 \| 0.7681 \| 0.8839 \| 0.8219 \|
	\| 0.4301 \| 0.5712 \| 42000 \| 0.4380 \| 0.806 \| 0.7640 \| 0.8886 \| 0.8216 \|
	\| 0.4418 \| 0.5984 \| 44000 \| 0.4280 \| 0.8096 \| 0.7857 \| 0.8544 \| 0.8186 \|
	\| 0.4334 \| 0.6256 \| 46000 \| 0.4326 \| 0.809 \| 0.7765 \| 0.8707 \| 0.8209 \|
	\| 0.4385 \| 0.6528 \| 48000 \| 0.4273 \| 0.8116 \| 0.7844 \| 0.8624 \| 0.8215 \|
	\| 0.4337 \| 0.6800 \| 50000 \| 0.4306 \| 0.8086 \| 0.7795 \| 0.8636 \| 0.8194 \|
	\| 0.4294 \| 0.7072 \| 52000 \| 0.4397 \| 0.811 \| 0.7706 \| 0.8886 \| 0.8254 \|
	\| 0.4276 \| 0.7344 \| 54000 \| 0.4344 \| 0.8138 \| 0.7770 \| 0.8831 \| 0.8267 \|
	\| 0.4183 \| 0.7616 \| 56000 \| 0.4291 \| 0.812 \| 0.7650 \| 0.9037 \| 0.8286 \|
	\| 0.4226 \| 0.7888 \| 58000 \| 0.4342 \| 0.8134 \| 0.7767 \| 0.8827 \| 0.8263 \|
	\| 0.4266 \| 0.8160 \| 60000 \| 0.4234 \| 0.8132 \| 0.7840 \| 0.8675 \| 0.8236 \|
	\| 0.4285 \| 0.8432 \| 62000 \| 0.4167 \| 0.8156 \| 0.7882 \| 0.8660 \| 0.8252 \|
	\| 0.4265 \| 0.8704 \| 64000 \| 0.4206 \| 0.8142 \| 0.7734 \| 0.8918 \| 0.8284 \|
	\| 0.429 \| 0.8976 \| 66000 \| 0.4165 \| 0.8174 \| 0.7910 \| 0.8656 \| 0.8266 \|
	\| 0.4308 \| 0.9248 \| 68000 \| 0.4192 \| 0.814 \| 0.7775 \| 0.8827 \| 0.8268 \|
	\| 0.4248 \| 0.9520 \| 70000 \| 0.4205 \| 0.8152 \| 0.7807 \| 0.8795 \| 0.8272 \|
	\| 0.425 \| 0.9792 \| 72000 \| 0.4194 \| 0.8164 \| 0.7814 \| 0.8815 \| 0.8284 \|


	### Framework versions

	- Transformers 4.45.2
	- Pytorch 2.4.1+cu124
	- Datasets 3.0.1
	- Tokenizers 0.20.1