SST2_DistilBERT_5E / README.md

update model card README.md

1e96d3e about 3 years ago

3.95 kB

	---
	license: apache-2.0
	tags:
	- generated_from_trainer
	metrics:
	- accuracy
	model-index:
	- name: SST2_DistilBERT_5E
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# SST2_DistilBERT_5E

	This model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.4125
	- Accuracy: 0.8933

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-05
	- train_batch_size: 16
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 5

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------:\|
	\| 0.6744 \| 0.12 \| 50 \| 0.6094 \| 0.66 \|
	\| 0.4942 \| 0.23 \| 100 \| 0.3772 \| 0.8667 \|
	\| 0.3857 \| 0.35 \| 150 \| 0.3256 \| 0.8867 \|
	\| 0.3483 \| 0.46 \| 200 \| 0.3634 \| 0.84 \|
	\| 0.3235 \| 0.58 \| 250 \| 0.3338 \| 0.8733 \|
	\| 0.3129 \| 0.69 \| 300 \| 0.3482 \| 0.8667 \|
	\| 0.3573 \| 0.81 \| 350 \| 0.3632 \| 0.8333 \|
	\| 0.3266 \| 0.92 \| 400 \| 0.3274 \| 0.86 \|
	\| 0.2615 \| 1.04 \| 450 \| 0.3400 \| 0.8667 \|
	\| 0.2409 \| 1.15 \| 500 \| 0.3541 \| 0.8467 \|
	\| 0.2508 \| 1.27 \| 550 \| 0.2997 \| 0.88 \|
	\| 0.2442 \| 1.39 \| 600 \| 0.3654 \| 0.86 \|
	\| 0.2625 \| 1.5 \| 650 \| 0.3302 \| 0.8667 \|
	\| 0.1983 \| 1.62 \| 700 \| 0.3184 \| 0.8867 \|
	\| 0.2356 \| 1.73 \| 750 \| 0.3239 \| 0.8867 \|
	\| 0.2078 \| 1.85 \| 800 \| 0.2968 \| 0.9 \|
	\| 0.2343 \| 1.96 \| 850 \| 0.3148 \| 0.8933 \|
	\| 0.1544 \| 2.08 \| 900 \| 0.3535 \| 0.9 \|
	\| 0.1407 \| 2.19 \| 950 \| 0.3603 \| 0.8733 \|
	\| 0.187 \| 2.31 \| 1000 \| 0.3843 \| 0.88 \|
	\| 0.144 \| 2.42 \| 1050 \| 0.4546 \| 0.8467 \|
	\| 0.1786 \| 2.54 \| 1100 \| 0.3681 \| 0.88 \|
	\| 0.1315 \| 2.66 \| 1150 \| 0.3806 \| 0.8867 \|
	\| 0.1399 \| 2.77 \| 1200 \| 0.3880 \| 0.8867 \|
	\| 0.1905 \| 2.89 \| 1250 \| 0.3944 \| 0.8733 \|
	\| 0.2043 \| 3.0 \| 1300 \| 0.3974 \| 0.8733 \|
	\| 0.1081 \| 3.12 \| 1350 \| 0.3731 \| 0.9067 \|
	\| 0.1055 \| 3.23 \| 1400 \| 0.3809 \| 0.8867 \|
	\| 0.1092 \| 3.35 \| 1450 \| 0.3568 \| 0.9 \|
	\| 0.0981 \| 3.46 \| 1500 \| 0.3610 \| 0.9133 \|
	\| 0.109 \| 3.58 \| 1550 \| 0.4126 \| 0.8867 \|
	\| 0.1001 \| 3.7 \| 1600 \| 0.3831 \| 0.9 \|
	\| 0.1027 \| 3.81 \| 1650 \| 0.4064 \| 0.9 \|
	\| 0.133 \| 3.93 \| 1700 \| 0.3845 \| 0.9 \|
	\| 0.1031 \| 4.04 \| 1750 \| 0.3915 \| 0.9 \|
	\| 0.0772 \| 4.16 \| 1800 \| 0.3988 \| 0.8867 \|
	\| 0.0785 \| 4.27 \| 1850 \| 0.3962 \| 0.9 \|
	\| 0.1059 \| 4.39 \| 1900 \| 0.3969 \| 0.9 \|
	\| 0.0668 \| 4.5 \| 1950 \| 0.4095 \| 0.8933 \|
	\| 0.0915 \| 4.62 \| 2000 \| 0.4077 \| 0.8933 \|
	\| 0.1413 \| 4.73 \| 2050 \| 0.4004 \| 0.9067 \|
	\| 0.0727 \| 4.85 \| 2100 \| 0.4100 \| 0.8933 \|
	\| 0.0724 \| 4.97 \| 2150 \| 0.4125 \| 0.8933 \|


	### Framework versions

	- Transformers 4.24.0
	- Pytorch 1.12.1+cu113
	- Datasets 2.7.0
	- Tokenizers 0.13.2

	---
	license: apache-2.0
	tags:
	- generated_from_trainer
	metrics:
	- accuracy
	model-index:
	- name: SST2_DistilBERT_5E
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# SST2_DistilBERT_5E

	This model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.4125
	- Accuracy: 0.8933

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-05
	- train_batch_size: 16
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 5

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------:\|
	\| 0.6744 \| 0.12 \| 50 \| 0.6094 \| 0.66 \|
	\| 0.4942 \| 0.23 \| 100 \| 0.3772 \| 0.8667 \|
	\| 0.3857 \| 0.35 \| 150 \| 0.3256 \| 0.8867 \|
	\| 0.3483 \| 0.46 \| 200 \| 0.3634 \| 0.84 \|
	\| 0.3235 \| 0.58 \| 250 \| 0.3338 \| 0.8733 \|
	\| 0.3129 \| 0.69 \| 300 \| 0.3482 \| 0.8667 \|
	\| 0.3573 \| 0.81 \| 350 \| 0.3632 \| 0.8333 \|
	\| 0.3266 \| 0.92 \| 400 \| 0.3274 \| 0.86 \|
	\| 0.2615 \| 1.04 \| 450 \| 0.3400 \| 0.8667 \|
	\| 0.2409 \| 1.15 \| 500 \| 0.3541 \| 0.8467 \|
	\| 0.2508 \| 1.27 \| 550 \| 0.2997 \| 0.88 \|
	\| 0.2442 \| 1.39 \| 600 \| 0.3654 \| 0.86 \|
	\| 0.2625 \| 1.5 \| 650 \| 0.3302 \| 0.8667 \|
	\| 0.1983 \| 1.62 \| 700 \| 0.3184 \| 0.8867 \|
	\| 0.2356 \| 1.73 \| 750 \| 0.3239 \| 0.8867 \|
	\| 0.2078 \| 1.85 \| 800 \| 0.2968 \| 0.9 \|
	\| 0.2343 \| 1.96 \| 850 \| 0.3148 \| 0.8933 \|
	\| 0.1544 \| 2.08 \| 900 \| 0.3535 \| 0.9 \|
	\| 0.1407 \| 2.19 \| 950 \| 0.3603 \| 0.8733 \|
	\| 0.187 \| 2.31 \| 1000 \| 0.3843 \| 0.88 \|
	\| 0.144 \| 2.42 \| 1050 \| 0.4546 \| 0.8467 \|
	\| 0.1786 \| 2.54 \| 1100 \| 0.3681 \| 0.88 \|
	\| 0.1315 \| 2.66 \| 1150 \| 0.3806 \| 0.8867 \|
	\| 0.1399 \| 2.77 \| 1200 \| 0.3880 \| 0.8867 \|
	\| 0.1905 \| 2.89 \| 1250 \| 0.3944 \| 0.8733 \|
	\| 0.2043 \| 3.0 \| 1300 \| 0.3974 \| 0.8733 \|
	\| 0.1081 \| 3.12 \| 1350 \| 0.3731 \| 0.9067 \|
	\| 0.1055 \| 3.23 \| 1400 \| 0.3809 \| 0.8867 \|
	\| 0.1092 \| 3.35 \| 1450 \| 0.3568 \| 0.9 \|
	\| 0.0981 \| 3.46 \| 1500 \| 0.3610 \| 0.9133 \|
	\| 0.109 \| 3.58 \| 1550 \| 0.4126 \| 0.8867 \|
	\| 0.1001 \| 3.7 \| 1600 \| 0.3831 \| 0.9 \|
	\| 0.1027 \| 3.81 \| 1650 \| 0.4064 \| 0.9 \|
	\| 0.133 \| 3.93 \| 1700 \| 0.3845 \| 0.9 \|
	\| 0.1031 \| 4.04 \| 1750 \| 0.3915 \| 0.9 \|
	\| 0.0772 \| 4.16 \| 1800 \| 0.3988 \| 0.8867 \|
	\| 0.0785 \| 4.27 \| 1850 \| 0.3962 \| 0.9 \|
	\| 0.1059 \| 4.39 \| 1900 \| 0.3969 \| 0.9 \|
	\| 0.0668 \| 4.5 \| 1950 \| 0.4095 \| 0.8933 \|
	\| 0.0915 \| 4.62 \| 2000 \| 0.4077 \| 0.8933 \|
	\| 0.1413 \| 4.73 \| 2050 \| 0.4004 \| 0.9067 \|
	\| 0.0727 \| 4.85 \| 2100 \| 0.4100 \| 0.8933 \|
	\| 0.0724 \| 4.97 \| 2150 \| 0.4125 \| 0.8933 \|


	### Framework versions

	- Transformers 4.24.0
	- Pytorch 1.12.1+cu113
	- Datasets 2.7.0
	- Tokenizers 0.13.2