prl90777
/

phi4_instruct_20250902_0749

Model card Files Files and versions

phi4_instruct_20250902_0749 / README.md

prl90777's picture

Model save

0fb6b14 verified 6 months ago

|

history blame contribute delete

2.82 kB

	---
	library_name: peft
	license: mit
	base_model: microsoft/Phi-4-mini-instruct
	tags:
	- base_model:adapter:microsoft/Phi-4-mini-instruct
	- lora
	- transformers
	model-index:
	- name: phi4_instruct_20250902_0749
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# phi4_instruct_20250902_0749

	This model is a fine-tuned version of [microsoft/Phi-4-mini-instruct](https://huggingface.co/microsoft/Phi-4-mini-instruct) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.6032
	- Map@3: 0.8772

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0002
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 64
	- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: linear
	- num_epochs: 3

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Map@3 \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|:------:\|
	\| 16.4003 \| 0.0523 \| 20 \| 1.3662 \| 0.7260 \|
	\| 9.5268 \| 0.1046 \| 40 \| 1.0851 \| 0.7793 \|
	\| 8.3438 \| 0.1569 \| 60 \| 0.9705 \| 0.7830 \|
	\| 7.495 \| 0.2092 \| 80 \| 0.9053 \| 0.8070 \|
	\| 7.4283 \| 0.2615 \| 100 \| 0.8488 \| 0.8326 \|
	\| 6.5147 \| 0.3138 \| 120 \| 0.7270 \| 0.8505 \|
	\| 5.6739 \| 0.3661 \| 140 \| 0.7694 \| 0.8445 \|
	\| 5.8069 \| 0.4184 \| 160 \| 0.6465 \| 0.8692 \|
	\| 5.8288 \| 0.4707 \| 180 \| 0.6613 \| 0.8537 \|
	\| 4.7902 \| 0.5230 \| 200 \| 0.6032 \| 0.8827 \|
	\| 5.0708 \| 0.5754 \| 220 \| 0.5588 \| 0.8832 \|
	\| 4.7105 \| 0.6277 \| 240 \| 0.5590 \| 0.8880 \|
	\| 4.6516 \| 0.6800 \| 260 \| 0.5281 \| 0.8944 \|
	\| 4.2103 \| 0.7323 \| 280 \| 0.5666 \| 0.8824 \|
	\| 4.9022 \| 0.7846 \| 300 \| 0.5486 \| 0.8936 \|
	\| 3.8328 \| 0.8369 \| 320 \| 0.5402 \| 0.8959 \|
	\| 4.2973 \| 0.8892 \| 340 \| 0.5149 \| 0.9046 \|
	\| 3.8971 \| 0.9415 \| 360 \| 0.4942 \| 0.8943 \|
	\| 3.9711 \| 0.9938 \| 380 \| 0.5736 \| 0.8894 \|
	\| 3.6985 \| 1.0445 \| 400 \| 0.5119 \| 0.9000 \|
	\| 2.971 \| 1.0968 \| 420 \| 0.6032 \| 0.8772 \|


	### Framework versions

	- PEFT 0.17.1
	- Transformers 4.56.0
	- Pytorch 2.8.0+cu126
	- Datasets 4.0.0
	- Tokenizers 0.22.0