BERiT_2000_enriched_optimized / README.md

update model card README.md

18fca67 about 3 years ago

3.88 kB

	---
	license: mit
	tags:
	- generated_from_trainer
	model-index:
	- name: BERiT_2000_enriched_optimized
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# BERiT_2000_enriched_optimized

	This model is a fine-tuned version of [roberta-base](https://huggingface.co/roberta-base) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 6.5710

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 6.732413659252984e-05
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 10

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:-----:\|:---------------:\|
	\| 6.4676 \| 0.19 \| 500 \| 6.1516 \|
	\| 6.0191 \| 0.39 \| 1000 \| 5.8660 \|
	\| 5.9008 \| 0.58 \| 1500 \| 5.9956 \|
	\| 5.7806 \| 0.77 \| 2000 \| 5.7032 \|
	\| 5.6932 \| 0.97 \| 2500 \| 5.6910 \|
	\| 6.4953 \| 1.16 \| 3000 \| 6.6394 \|
	\| 6.6419 \| 1.36 \| 3500 \| 6.6176 \|
	\| 6.6462 \| 1.55 \| 4000 \| 6.5961 \|
	\| 6.6402 \| 1.74 \| 4500 \| 6.6224 \|
	\| 6.6169 \| 1.94 \| 5000 \| 6.6091 \|
	\| 6.6396 \| 2.13 \| 5500 \| 6.6443 \|
	\| 6.6599 \| 2.32 \| 6000 \| 6.6150 \|
	\| 6.5956 \| 2.52 \| 6500 \| 6.6173 \|
	\| 6.6397 \| 2.71 \| 7000 \| 6.6038 \|
	\| 6.6261 \| 2.9 \| 7500 \| 6.6214 \|
	\| 6.6162 \| 3.1 \| 8000 \| 6.6271 \|
	\| 6.6102 \| 3.29 \| 8500 \| 6.5843 \|
	\| 6.6116 \| 3.49 \| 9000 \| 6.6044 \|
	\| 6.6146 \| 3.68 \| 9500 \| 6.6092 \|
	\| 6.5922 \| 3.87 \| 10000 \| 6.6182 \|
	\| 6.6246 \| 4.07 \| 10500 \| 6.5832 \|
	\| 6.6124 \| 4.26 \| 11000 \| 6.6141 \|
	\| 6.6002 \| 4.45 \| 11500 \| 6.6385 \|
	\| 6.6015 \| 4.65 \| 12000 \| 6.5984 \|
	\| 6.6024 \| 4.84 \| 12500 \| 6.6236 \|
	\| 6.6097 \| 5.03 \| 13000 \| 6.6254 \|
	\| 6.5937 \| 5.23 \| 13500 \| 6.6154 \|
	\| 6.5973 \| 5.42 \| 14000 \| 6.5731 \|
	\| 6.6141 \| 5.62 \| 14500 \| 6.6308 \|
	\| 6.5976 \| 5.81 \| 15000 \| 6.5824 \|
	\| 6.5982 \| 6.0 \| 15500 \| 6.6024 \|
	\| 6.6032 \| 6.2 \| 16000 \| 6.5891 \|
	\| 6.603 \| 6.39 \| 16500 \| 6.5926 \|
	\| 6.6089 \| 6.58 \| 17000 \| 6.6090 \|
	\| 6.6067 \| 6.78 \| 17500 \| 6.6137 \|
	\| 6.5718 \| 6.97 \| 18000 \| 6.5817 \|
	\| 6.6036 \| 7.16 \| 18500 \| 6.6008 \|
	\| 6.6001 \| 7.36 \| 19000 \| 6.5571 \|
	\| 6.6203 \| 7.55 \| 19500 \| 6.5778 \|
	\| 6.6055 \| 7.75 \| 20000 \| 6.5805 \|
	\| 6.6168 \| 7.94 \| 20500 \| 6.6099 \|
	\| 6.5874 \| 8.13 \| 21000 \| 6.6125 \|
	\| 6.5932 \| 8.33 \| 21500 \| 6.5701 \|
	\| 6.5984 \| 8.52 \| 22000 \| 6.5719 \|
	\| 6.5753 \| 8.71 \| 22500 \| 6.6199 \|
	\| 6.599 \| 8.91 \| 23000 \| 6.5756 \|
	\| 6.579 \| 9.1 \| 23500 \| 6.5926 \|
	\| 6.5805 \| 9.3 \| 24000 \| 6.5623 \|
	\| 6.5753 \| 9.49 \| 24500 \| 6.5818 \|
	\| 6.5645 \| 9.68 \| 25000 \| 6.5726 \|
	\| 6.6094 \| 9.88 \| 25500 \| 6.5710 \|


	### Framework versions

	- Transformers 4.24.0
	- Pytorch 1.12.1+cu113
	- Datasets 2.6.1
	- Tokenizers 0.13.2

	---
	license: mit
	tags:
	- generated_from_trainer
	model-index:
	- name: BERiT_2000_enriched_optimized
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# BERiT_2000_enriched_optimized

	This model is a fine-tuned version of [roberta-base](https://huggingface.co/roberta-base) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 6.5710

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 6.732413659252984e-05
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 10

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:-----:\|:---------------:\|
	\| 6.4676 \| 0.19 \| 500 \| 6.1516 \|
	\| 6.0191 \| 0.39 \| 1000 \| 5.8660 \|
	\| 5.9008 \| 0.58 \| 1500 \| 5.9956 \|
	\| 5.7806 \| 0.77 \| 2000 \| 5.7032 \|
	\| 5.6932 \| 0.97 \| 2500 \| 5.6910 \|
	\| 6.4953 \| 1.16 \| 3000 \| 6.6394 \|
	\| 6.6419 \| 1.36 \| 3500 \| 6.6176 \|
	\| 6.6462 \| 1.55 \| 4000 \| 6.5961 \|
	\| 6.6402 \| 1.74 \| 4500 \| 6.6224 \|
	\| 6.6169 \| 1.94 \| 5000 \| 6.6091 \|
	\| 6.6396 \| 2.13 \| 5500 \| 6.6443 \|
	\| 6.6599 \| 2.32 \| 6000 \| 6.6150 \|
	\| 6.5956 \| 2.52 \| 6500 \| 6.6173 \|
	\| 6.6397 \| 2.71 \| 7000 \| 6.6038 \|
	\| 6.6261 \| 2.9 \| 7500 \| 6.6214 \|
	\| 6.6162 \| 3.1 \| 8000 \| 6.6271 \|
	\| 6.6102 \| 3.29 \| 8500 \| 6.5843 \|
	\| 6.6116 \| 3.49 \| 9000 \| 6.6044 \|
	\| 6.6146 \| 3.68 \| 9500 \| 6.6092 \|
	\| 6.5922 \| 3.87 \| 10000 \| 6.6182 \|
	\| 6.6246 \| 4.07 \| 10500 \| 6.5832 \|
	\| 6.6124 \| 4.26 \| 11000 \| 6.6141 \|
	\| 6.6002 \| 4.45 \| 11500 \| 6.6385 \|
	\| 6.6015 \| 4.65 \| 12000 \| 6.5984 \|
	\| 6.6024 \| 4.84 \| 12500 \| 6.6236 \|
	\| 6.6097 \| 5.03 \| 13000 \| 6.6254 \|
	\| 6.5937 \| 5.23 \| 13500 \| 6.6154 \|
	\| 6.5973 \| 5.42 \| 14000 \| 6.5731 \|
	\| 6.6141 \| 5.62 \| 14500 \| 6.6308 \|
	\| 6.5976 \| 5.81 \| 15000 \| 6.5824 \|
	\| 6.5982 \| 6.0 \| 15500 \| 6.6024 \|
	\| 6.6032 \| 6.2 \| 16000 \| 6.5891 \|
	\| 6.603 \| 6.39 \| 16500 \| 6.5926 \|
	\| 6.6089 \| 6.58 \| 17000 \| 6.6090 \|
	\| 6.6067 \| 6.78 \| 17500 \| 6.6137 \|
	\| 6.5718 \| 6.97 \| 18000 \| 6.5817 \|
	\| 6.6036 \| 7.16 \| 18500 \| 6.6008 \|
	\| 6.6001 \| 7.36 \| 19000 \| 6.5571 \|
	\| 6.6203 \| 7.55 \| 19500 \| 6.5778 \|
	\| 6.6055 \| 7.75 \| 20000 \| 6.5805 \|
	\| 6.6168 \| 7.94 \| 20500 \| 6.6099 \|
	\| 6.5874 \| 8.13 \| 21000 \| 6.6125 \|
	\| 6.5932 \| 8.33 \| 21500 \| 6.5701 \|
	\| 6.5984 \| 8.52 \| 22000 \| 6.5719 \|
	\| 6.5753 \| 8.71 \| 22500 \| 6.6199 \|
	\| 6.599 \| 8.91 \| 23000 \| 6.5756 \|
	\| 6.579 \| 9.1 \| 23500 \| 6.5926 \|
	\| 6.5805 \| 9.3 \| 24000 \| 6.5623 \|
	\| 6.5753 \| 9.49 \| 24500 \| 6.5818 \|
	\| 6.5645 \| 9.68 \| 25000 \| 6.5726 \|
	\| 6.6094 \| 9.88 \| 25500 \| 6.5710 \|


	### Framework versions

	- Transformers 4.24.0
	- Pytorch 1.12.1+cu113
	- Datasets 2.6.1
	- Tokenizers 0.13.2