fabikru
/

trainer

Generated from Trainer

Model card Files Files and versions

trainer / README.md

fabikru's picture

model_5M_large_ds_masking_0.5_predicted_hparamas

4c33078 verified 8 months ago

|

history blame contribute delete

3.8 kB

	---
	library_name: transformers
	tags:
	- generated_from_trainer
	metrics:
	- accuracy
	model-index:
	- name: trainer
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# trainer

	This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.3546
	- Accuracy: 0.8807

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.032227
	- train_batch_size: 512
	- eval_batch_size: 512
	- seed: 42
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 4096
	- optimizer: Use OptimizerNames.SCHEDULE_FREE_ADAMW with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: constant
	- lr_scheduler_warmup_steps: 1000
	- training_steps: 1000000
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|:--------:\|
	\| No log \| 0 \| 0 \| 4.3903 \| 0.0137 \|
	\| No log \| 0.0044 \| 122 \| 1.1251 \| 0.6574 \|
	\| No log \| 0.0087 \| 244 \| 0.8266 \| 0.7365 \|
	\| No log \| 0.0131 \| 366 \| 0.7493 \| 0.7590 \|
	\| No log \| 0.0175 \| 488 \| 0.6913 \| 0.7755 \|
	\| 9.1782 \| 0.0218 \| 610 \| 0.6348 \| 0.7927 \|
	\| 9.1782 \| 0.0262 \| 732 \| 0.5897 \| 0.8064 \|
	\| 9.1782 \| 0.0306 \| 854 \| 0.5569 \| 0.8170 \|
	\| 9.1782 \| 0.0349 \| 976 \| 0.5262 \| 0.8266 \|
	\| 5.0917 \| 0.0393 \| 1098 \| 0.4957 \| 0.8360 \|
	\| 5.0917 \| 0.0437 \| 1220 \| 0.4761 \| 0.8424 \|
	\| 5.0917 \| 0.0480 \| 1342 \| 0.4616 \| 0.8464 \|
	\| 5.0917 \| 0.0524 \| 1464 \| 0.4479 \| 0.8510 \|
	\| 4.0398 \| 0.0568 \| 1586 \| 0.4397 \| 0.8536 \|
	\| 4.0398 \| 0.0611 \| 1708 \| 0.4293 \| 0.8564 \|
	\| 4.0398 \| 0.0655 \| 1830 \| 0.4231 \| 0.8592 \|
	\| 4.0398 \| 0.0699 \| 1952 \| 0.4139 \| 0.8614 \|
	\| 3.5268 \| 0.0743 \| 2074 \| 0.4088 \| 0.8635 \|
	\| 3.5268 \| 0.0786 \| 2196 \| 0.4035 \| 0.8649 \|
	\| 3.5268 \| 0.0830 \| 2318 \| 0.4000 \| 0.8666 \|
	\| 3.5268 \| 0.0874 \| 2440 \| 0.3950 \| 0.8678 \|
	\| 3.3084 \| 0.0917 \| 2562 \| 0.3915 \| 0.8688 \|
	\| 3.3084 \| 0.0961 \| 2684 \| 0.3866 \| 0.8705 \|
	\| 3.3084 \| 0.1005 \| 2806 \| 0.3843 \| 0.8712 \|
	\| 3.3084 \| 0.1048 \| 2928 \| 0.3804 \| 0.8726 \|
	\| 3.1769 \| 0.1092 \| 3050 \| 0.3776 \| 0.8733 \|
	\| 3.1769 \| 0.1136 \| 3172 \| 0.3729 \| 0.8749 \|
	\| 3.1769 \| 0.1179 \| 3294 \| 0.3723 \| 0.8751 \|
	\| 3.1769 \| 0.1223 \| 3416 \| 0.3698 \| 0.8759 \|
	\| 3.0785 \| 0.1267 \| 3538 \| 0.3659 \| 0.8772 \|
	\| 3.0785 \| 0.1310 \| 3660 \| 0.3644 \| 0.8775 \|
	\| 3.0785 \| 0.1354 \| 3782 \| 0.3599 \| 0.8788 \|
	\| 3.0785 \| 0.1398 \| 3904 \| 0.3584 \| 0.8794 \|
	\| 2.9831 \| 0.1441 \| 4026 \| 0.3567 \| 0.8800 \|
	\| 2.9831 \| 0.1485 \| 4148 \| 0.3528 \| 0.8817 \|
	\| 2.9831 \| 0.1529 \| 4270 \| 0.3535 \| 0.8811 \|
	\| 2.9831 \| 0.1572 \| 4392 \| 0.3541 \| 0.8809 \|


	### Framework versions

	- Transformers 4.52.2
	- Pytorch 2.8.0.dev20250521+cu128
	- Datasets 3.6.0
	- Tokenizers 0.21.1