i-be-snek
/

mixtral_5_6gpu

Text Generation

Generated from Trainer

text-generation-inference

Model card Files Files and versions

mixtral_5_6gpu / README.md

i-be-snek's picture

End of training

c349e8e verified 6 months ago

|

history blame contribute delete

3.65 kB

	---
	library_name: transformers
	tags:
	- generated_from_trainer
	datasets:
	- arrow
	model-index:
	- name: mixtral_5_6gpu
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# mixtral_5_6gpu

	This model is a fine-tuned version of [](https://huggingface.co/) on the arrow dataset.
	It achieves the following results on the evaluation set:
	- Loss: 4.3696

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0001
	- train_batch_size: 4
	- eval_batch_size: 8
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 6
	- gradient_accumulation_steps: 16
	- total_train_batch_size: 384
	- total_eval_batch_size: 48
	- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 500
	- training_steps: 40746
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:-----:\|:---------------:\|
	\| No log \| 0 \| 0 \| 10.9761 \|
	\| 7.1488 \| 0.2454 \| 1000 \| 6.9551 \|
	\| 5.9011 \| 0.4908 \| 2000 \| 5.8183 \|
	\| 5.4187 \| 0.7363 \| 3000 \| 5.3778 \|
	\| 5.1765 \| 0.9817 \| 4000 \| 5.1484 \|
	\| 4.983 \| 1.2270 \| 5000 \| 5.0035 \|
	\| 4.876 \| 1.4724 \| 6000 \| 4.8925 \|
	\| 4.7906 \| 1.7179 \| 7000 \| 4.7991 \|
	\| 4.7131 \| 1.9633 \| 8000 \| 4.7258 \|
	\| 4.5733 \| 2.2086 \| 9000 \| 4.6749 \|
	\| 4.5394 \| 2.4540 \| 10000 \| 4.6248 \|
	\| 4.5068 \| 2.6995 \| 11000 \| 4.5808 \|
	\| 4.469 \| 2.9449 \| 12000 \| 4.5393 \|
	\| 4.3381 \| 3.1902 \| 13000 \| 4.5207 \|
	\| 4.3277 \| 3.4356 \| 14000 \| 4.4930 \|
	\| 4.3198 \| 3.6810 \| 15000 \| 4.4654 \|
	\| 4.2995 \| 3.9265 \| 16000 \| 4.4391 \|
	\| 4.1697 \| 4.1718 \| 17000 \| 4.4364 \|
	\| 4.1779 \| 4.4172 \| 18000 \| 4.4203 \|
	\| 4.1732 \| 4.6626 \| 19000 \| 4.4012 \|
	\| 4.1631 \| 4.9081 \| 20000 \| 4.3828 \|
	\| 4.0294 \| 5.1534 \| 21000 \| 4.3887 \|
	\| 4.0533 \| 5.3988 \| 22000 \| 4.3801 \|
	\| 4.0511 \| 5.6442 \| 23000 \| 4.3681 \|
	\| 4.0532 \| 5.8897 \| 24000 \| 4.3559 \|
	\| 3.9201 \| 6.1350 \| 25000 \| 4.3686 \|
	\| 3.9407 \| 6.3804 \| 26000 \| 4.3653 \|
	\| 3.9511 \| 6.6258 \| 27000 \| 4.3558 \|
	\| 3.9468 \| 6.8712 \| 28000 \| 4.3467 \|
	\| 3.8237 \| 7.1166 \| 29000 \| 4.3628 \|
	\| 3.8449 \| 7.3620 \| 30000 \| 4.3622 \|
	\| 3.8537 \| 7.6074 \| 31000 \| 4.3554 \|
	\| 3.8602 \| 7.8528 \| 32000 \| 4.3491 \|
	\| 3.7498 \| 8.0982 \| 33000 \| 4.3658 \|
	\| 3.7648 \| 8.3436 \| 34000 \| 4.3675 \|
	\| 3.7633 \| 8.5890 \| 35000 \| 4.3641 \|
	\| 3.7766 \| 8.8344 \| 36000 \| 4.3592 \|
	\| 3.6848 \| 9.0798 \| 37000 \| 4.3705 \|
	\| 3.6937 \| 9.3252 \| 38000 \| 4.3738 \|
	\| 3.6984 \| 9.5706 \| 39000 \| 4.3721 \|
	\| 3.7008 \| 9.8160 \| 40000 \| 4.3701 \|


	### Framework versions

	- Transformers 4.53.1
	- Pytorch 2.7.0+cu126
	- Datasets 3.6.0
	- Tokenizers 0.21.1