NBRZ
/

bert-trainer-8b

Generated from Trainer

Model card Files Files and versions

Metrics Training metrics Community

bert-trainer-8b / README.md

NBRZ's picture

update model card README.md

70a6526 over 2 years ago

|

2.91 kB

	---
	tags:
	- generated_from_trainer
	datasets:
	- generator
	model-index:
	- name: bert-trainer-8b
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# bert-trainer-8b

	This model is a fine-tuned version of [](https://huggingface.co/) on the generator dataset.
	It achieves the following results on the evaluation set:
	- Loss: 3.1639

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0005
	- train_batch_size: 64
	- eval_batch_size: 64
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 1000
	- num_epochs: 32
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:-----:\|:---------------:\|
	\| 6.5416 \| 1.0 \| 500 \| 6.5207 \|
	\| 6.393 \| 1.99 \| 1000 \| 6.3903 \|
	\| 6.2817 \| 2.99 \| 1500 \| 6.3033 \|
	\| 6.2274 \| 3.98 \| 2000 \| 6.2671 \|
	\| 6.179 \| 4.98 \| 2500 \| 6.2431 \|
	\| 6.1684 \| 5.98 \| 3000 \| 6.2309 \|
	\| 6.1244 \| 6.97 \| 3500 \| 6.2114 \|
	\| 6.0879 \| 7.97 \| 4000 \| 6.1932 \|
	\| 6.0643 \| 8.96 \| 4500 \| 6.1791 \|
	\| 6.0481 \| 9.96 \| 5000 \| 6.1638 \|
	\| 6.0231 \| 10.96 \| 5500 \| 6.1581 \|
	\| 5.9987 \| 11.95 \| 6000 \| 6.1365 \|
	\| 5.9989 \| 12.95 \| 6500 \| 6.1194 \|
	\| 5.9535 \| 13.94 \| 7000 \| 6.1095 \|
	\| 5.9139 \| 14.94 \| 7500 \| 6.0890 \|
	\| 5.8462 \| 15.94 \| 8000 \| 6.0224 \|
	\| 5.7689 \| 16.93 \| 8500 \| 5.9266 \|
	\| 5.6137 \| 17.93 \| 9000 \| 5.7195 \|
	\| 4.7163 \| 18.92 \| 9500 \| 4.6131 \|
	\| 4.0877 \| 19.92 \| 10000 \| 4.0903 \|
	\| 3.7832 \| 20.92 \| 10500 \| 3.8340 \|
	\| 3.6104 \| 21.91 \| 11000 \| 3.6572 \|
	\| 3.4615 \| 22.91 \| 11500 \| 3.5278 \|
	\| 3.3661 \| 23.9 \| 12000 \| 3.4201 \|
	\| 3.271 \| 24.9 \| 12500 \| 3.3333 \|
	\| 3.2179 \| 25.9 \| 13000 \| 3.2720 \|
	\| 3.1759 \| 26.89 \| 13500 \| 3.2317 \|
	\| 3.1419 \| 27.89 \| 14000 \| 3.2006 \|
	\| 3.1041 \| 28.88 \| 14500 \| 3.1806 \|
	\| 3.0836 \| 29.88 \| 15000 \| 3.1693 \|
	\| 3.0998 \| 30.88 \| 15500 \| 3.1679 \|
	\| 3.08 \| 31.87 \| 16000 \| 3.1639 \|


	### Framework versions

	- Transformers 4.26.1
	- Pytorch 1.13.1
	- Datasets 2.9.0
	- Tokenizers 0.13.2