working / README.md

mohits01/phi-1-5-SBC-Mohit-ft

857e43e verified almost 2 years ago

3.81 kB

	---
	license: mit
	library_name: peft
	tags:
	- generated_from_trainer
	base_model: microsoft/phi-1_5
	model-index:
	- name: working
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# working

	This model is a fine-tuned version of [microsoft/phi-1_5](https://huggingface.co/microsoft/phi-1_5) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.4965

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0002
	- train_batch_size: 6
	- eval_batch_size: 6
	- seed: 42
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 24
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 2
	- num_epochs: 50
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 3.9692 \| 0.95 \| 5 \| 3.7663 \|
	\| 3.8826 \| 1.9 \| 10 \| 3.6222 \|
	\| 3.7248 \| 2.86 \| 15 \| 3.4342 \|
	\| 2.8804 \| 4.0 \| 21 \| 3.1608 \|
	\| 3.1948 \| 4.95 \| 26 \| 2.8958 \|
	\| 2.9136 \| 5.9 \| 31 \| 2.6167 \|
	\| 2.5989 \| 6.86 \| 36 \| 2.2949 \|
	\| 1.869 \| 8.0 \| 42 \| 1.8694 \|
	\| 1.8586 \| 8.95 \| 47 \| 1.5201 \|
	\| 1.5399 \| 9.9 \| 52 \| 1.2544 \|
	\| 1.3188 \| 10.86 \| 57 \| 1.1105 \|
	\| 0.9827 \| 12.0 \| 63 \| 0.9700 \|
	\| 1.0818 \| 12.95 \| 68 \| 0.8830 \|
	\| 0.9514 \| 13.9 \| 73 \| 0.8180 \|
	\| 0.903 \| 14.86 \| 78 \| 0.7661 \|
	\| 0.6992 \| 16.0 \| 84 \| 0.7211 \|
	\| 0.7744 \| 16.95 \| 89 \| 0.6985 \|
	\| 0.7585 \| 17.9 \| 94 \| 0.6771 \|
	\| 0.7381 \| 18.86 \| 99 \| 0.6627 \|
	\| 0.5829 \| 20.0 \| 105 \| 0.6441 \|
	\| 0.6846 \| 20.95 \| 110 \| 0.6344 \|
	\| 0.6616 \| 21.9 \| 115 \| 0.6242 \|
	\| 0.622 \| 22.86 \| 120 \| 0.6125 \|
	\| 0.512 \| 24.0 \| 126 \| 0.6008 \|
	\| 0.5945 \| 24.95 \| 131 \| 0.5926 \|
	\| 0.5956 \| 25.9 \| 136 \| 0.5843 \|
	\| 0.5672 \| 26.86 \| 141 \| 0.5782 \|
	\| 0.4526 \| 28.0 \| 147 \| 0.5681 \|
	\| 0.5338 \| 28.95 \| 152 \| 0.5603 \|
	\| 0.5228 \| 29.9 \| 157 \| 0.5548 \|
	\| 0.5295 \| 30.86 \| 162 \| 0.5474 \|
	\| 0.4214 \| 32.0 \| 168 \| 0.5435 \|
	\| 0.4929 \| 32.95 \| 173 \| 0.5363 \|
	\| 0.4764 \| 33.9 \| 178 \| 0.5330 \|
	\| 0.4804 \| 34.86 \| 183 \| 0.5274 \|
	\| 0.3795 \| 36.0 \| 189 \| 0.5230 \|
	\| 0.4529 \| 36.95 \| 194 \| 0.5176 \|
	\| 0.4614 \| 37.9 \| 199 \| 0.5139 \|
	\| 0.4334 \| 38.86 \| 204 \| 0.5110 \|
	\| 0.3623 \| 40.0 \| 210 \| 0.5072 \|
	\| 0.4472 \| 40.95 \| 215 \| 0.5059 \|
	\| 0.4261 \| 41.9 \| 220 \| 0.5024 \|
	\| 0.4203 \| 42.86 \| 225 \| 0.5017 \|
	\| 0.3447 \| 44.0 \| 231 \| 0.4982 \|
	\| 0.4222 \| 44.95 \| 236 \| 0.4977 \|
	\| 0.4143 \| 45.9 \| 241 \| 0.4970 \|
	\| 0.4103 \| 46.86 \| 246 \| 0.4966 \|
	\| 0.3427 \| 47.62 \| 250 \| 0.4965 \|


	### Framework versions

	- PEFT 0.10.0
	- Transformers 4.38.2
	- Pytorch 2.1.2
	- Datasets 2.1.0
	- Tokenizers 0.15.2

	---
	license: mit
	library_name: peft
	tags:
	- generated_from_trainer
	base_model: microsoft/phi-1_5
	model-index:
	- name: working
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# working

	This model is a fine-tuned version of [microsoft/phi-1_5](https://huggingface.co/microsoft/phi-1_5) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.4965

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0002
	- train_batch_size: 6
	- eval_batch_size: 6
	- seed: 42
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 24
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 2
	- num_epochs: 50
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 3.9692 \| 0.95 \| 5 \| 3.7663 \|
	\| 3.8826 \| 1.9 \| 10 \| 3.6222 \|
	\| 3.7248 \| 2.86 \| 15 \| 3.4342 \|
	\| 2.8804 \| 4.0 \| 21 \| 3.1608 \|
	\| 3.1948 \| 4.95 \| 26 \| 2.8958 \|
	\| 2.9136 \| 5.9 \| 31 \| 2.6167 \|
	\| 2.5989 \| 6.86 \| 36 \| 2.2949 \|
	\| 1.869 \| 8.0 \| 42 \| 1.8694 \|
	\| 1.8586 \| 8.95 \| 47 \| 1.5201 \|
	\| 1.5399 \| 9.9 \| 52 \| 1.2544 \|
	\| 1.3188 \| 10.86 \| 57 \| 1.1105 \|
	\| 0.9827 \| 12.0 \| 63 \| 0.9700 \|
	\| 1.0818 \| 12.95 \| 68 \| 0.8830 \|
	\| 0.9514 \| 13.9 \| 73 \| 0.8180 \|
	\| 0.903 \| 14.86 \| 78 \| 0.7661 \|
	\| 0.6992 \| 16.0 \| 84 \| 0.7211 \|
	\| 0.7744 \| 16.95 \| 89 \| 0.6985 \|
	\| 0.7585 \| 17.9 \| 94 \| 0.6771 \|
	\| 0.7381 \| 18.86 \| 99 \| 0.6627 \|
	\| 0.5829 \| 20.0 \| 105 \| 0.6441 \|
	\| 0.6846 \| 20.95 \| 110 \| 0.6344 \|
	\| 0.6616 \| 21.9 \| 115 \| 0.6242 \|
	\| 0.622 \| 22.86 \| 120 \| 0.6125 \|
	\| 0.512 \| 24.0 \| 126 \| 0.6008 \|
	\| 0.5945 \| 24.95 \| 131 \| 0.5926 \|
	\| 0.5956 \| 25.9 \| 136 \| 0.5843 \|
	\| 0.5672 \| 26.86 \| 141 \| 0.5782 \|
	\| 0.4526 \| 28.0 \| 147 \| 0.5681 \|
	\| 0.5338 \| 28.95 \| 152 \| 0.5603 \|
	\| 0.5228 \| 29.9 \| 157 \| 0.5548 \|
	\| 0.5295 \| 30.86 \| 162 \| 0.5474 \|
	\| 0.4214 \| 32.0 \| 168 \| 0.5435 \|
	\| 0.4929 \| 32.95 \| 173 \| 0.5363 \|
	\| 0.4764 \| 33.9 \| 178 \| 0.5330 \|
	\| 0.4804 \| 34.86 \| 183 \| 0.5274 \|
	\| 0.3795 \| 36.0 \| 189 \| 0.5230 \|
	\| 0.4529 \| 36.95 \| 194 \| 0.5176 \|
	\| 0.4614 \| 37.9 \| 199 \| 0.5139 \|
	\| 0.4334 \| 38.86 \| 204 \| 0.5110 \|
	\| 0.3623 \| 40.0 \| 210 \| 0.5072 \|
	\| 0.4472 \| 40.95 \| 215 \| 0.5059 \|
	\| 0.4261 \| 41.9 \| 220 \| 0.5024 \|
	\| 0.4203 \| 42.86 \| 225 \| 0.5017 \|
	\| 0.3447 \| 44.0 \| 231 \| 0.4982 \|
	\| 0.4222 \| 44.95 \| 236 \| 0.4977 \|
	\| 0.4143 \| 45.9 \| 241 \| 0.4970 \|
	\| 0.4103 \| 46.86 \| 246 \| 0.4966 \|
	\| 0.3427 \| 47.62 \| 250 \| 0.4965 \|


	### Framework versions

	- PEFT 0.10.0
	- Transformers 4.38.2
	- Pytorch 2.1.2
	- Datasets 2.1.0
	- Tokenizers 0.15.2