abhishekyo/finetune8

01856cf verified almost 2 years ago

3.84 kB

	---
	license: llama2
	library_name: peft
	tags:
	- trl
	- sft
	- generated_from_trainer
	base_model: codellama/CodeLlama-7b-hf
	model-index:
	- name: codellama2-finetuned-codex
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# codellama2-finetuned-codex

	This model is a fine-tuned version of [codellama/CodeLlama-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.2549

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 4e-05
	- train_batch_size: 4
	- eval_batch_size: 4
	- seed: 42
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 16
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.03
	- num_epochs: 20
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 1.876 \| 0.43 \| 20 \| 1.7743 \|
	\| 1.6213 \| 0.85 \| 40 \| 1.4426 \|
	\| 1.0392 \| 1.28 \| 60 \| 0.9086 \|
	\| 0.6962 \| 1.7 \| 80 \| 0.6664 \|
	\| 0.529 \| 2.13 \| 100 \| 0.5439 \|
	\| 0.4614 \| 2.55 \| 120 \| 0.4650 \|
	\| 0.4264 \| 2.98 \| 140 \| 0.4218 \|
	\| 0.376 \| 3.4 \| 160 \| 0.3951 \|
	\| 0.3665 \| 3.83 \| 180 \| 0.3722 \|
	\| 0.3398 \| 4.26 \| 200 \| 0.3559 \|
	\| 0.3198 \| 4.68 \| 220 \| 0.3389 \|
	\| 0.3263 \| 5.11 \| 240 \| 0.3317 \|
	\| 0.2952 \| 5.53 \| 260 \| 0.3223 \|
	\| 0.2871 \| 5.96 \| 280 \| 0.3136 \|
	\| 0.2861 \| 6.38 \| 300 \| 0.3084 \|
	\| 0.2899 \| 6.81 \| 320 \| 0.3021 \|
	\| 0.2769 \| 7.23 \| 340 \| 0.2982 \|
	\| 0.2541 \| 7.66 \| 360 \| 0.2951 \|
	\| 0.2421 \| 8.09 \| 380 \| 0.2914 \|
	\| 0.2275 \| 8.51 \| 400 \| 0.2887 \|
	\| 0.26 \| 8.94 \| 420 \| 0.2799 \|
	\| 0.2275 \| 9.36 \| 440 \| 0.2797 \|
	\| 0.2291 \| 9.79 \| 460 \| 0.2722 \|
	\| 0.2222 \| 10.21 \| 480 \| 0.2744 \|
	\| 0.2391 \| 10.64 \| 500 \| 0.2721 \|
	\| 0.208 \| 11.06 \| 520 \| 0.2671 \|
	\| 0.2012 \| 11.49 \| 540 \| 0.2691 \|
	\| 0.2092 \| 11.91 \| 560 \| 0.2619 \|
	\| 0.1761 \| 12.34 \| 580 \| 0.2636 \|
	\| 0.2248 \| 12.77 \| 600 \| 0.2596 \|
	\| 0.1803 \| 13.19 \| 620 \| 0.2611 \|
	\| 0.2022 \| 13.62 \| 640 \| 0.2597 \|
	\| 0.2006 \| 14.04 \| 660 \| 0.2578 \|
	\| 0.1864 \| 14.47 \| 680 \| 0.2561 \|
	\| 0.1933 \| 14.89 \| 700 \| 0.2560 \|
	\| 0.1892 \| 15.32 \| 720 \| 0.2570 \|
	\| 0.192 \| 15.74 \| 740 \| 0.2562 \|
	\| 0.1883 \| 16.17 \| 760 \| 0.2553 \|
	\| 0.1781 \| 16.6 \| 780 \| 0.2549 \|
	\| 0.1705 \| 17.02 \| 800 \| 0.2560 \|
	\| 0.181 \| 17.45 \| 820 \| 0.2566 \|
	\| 0.1552 \| 17.87 \| 840 \| 0.2551 \|
	\| 0.173 \| 18.3 \| 860 \| 0.2560 \|
	\| 0.1934 \| 18.72 \| 880 \| 0.2557 \|
	\| 0.1754 \| 19.15 \| 900 \| 0.2555 \|
	\| 0.1796 \| 19.57 \| 920 \| 0.2555 \|
	\| 0.1745 \| 20.0 \| 940 \| 0.2555 \|


	### Framework versions

	- PEFT 0.10.0
	- Transformers 4.39.3
	- Pytorch 2.1.2
	- Datasets 2.18.0
	- Tokenizers 0.15.2

	---
	license: llama2
	library_name: peft
	tags:
	- trl
	- sft
	- generated_from_trainer
	base_model: codellama/CodeLlama-7b-hf
	model-index:
	- name: codellama2-finetuned-codex
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# codellama2-finetuned-codex

	This model is a fine-tuned version of [codellama/CodeLlama-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.2549

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 4e-05
	- train_batch_size: 4
	- eval_batch_size: 4
	- seed: 42
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 16
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.03
	- num_epochs: 20
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 1.876 \| 0.43 \| 20 \| 1.7743 \|
	\| 1.6213 \| 0.85 \| 40 \| 1.4426 \|
	\| 1.0392 \| 1.28 \| 60 \| 0.9086 \|
	\| 0.6962 \| 1.7 \| 80 \| 0.6664 \|
	\| 0.529 \| 2.13 \| 100 \| 0.5439 \|
	\| 0.4614 \| 2.55 \| 120 \| 0.4650 \|
	\| 0.4264 \| 2.98 \| 140 \| 0.4218 \|
	\| 0.376 \| 3.4 \| 160 \| 0.3951 \|
	\| 0.3665 \| 3.83 \| 180 \| 0.3722 \|
	\| 0.3398 \| 4.26 \| 200 \| 0.3559 \|
	\| 0.3198 \| 4.68 \| 220 \| 0.3389 \|
	\| 0.3263 \| 5.11 \| 240 \| 0.3317 \|
	\| 0.2952 \| 5.53 \| 260 \| 0.3223 \|
	\| 0.2871 \| 5.96 \| 280 \| 0.3136 \|
	\| 0.2861 \| 6.38 \| 300 \| 0.3084 \|
	\| 0.2899 \| 6.81 \| 320 \| 0.3021 \|
	\| 0.2769 \| 7.23 \| 340 \| 0.2982 \|
	\| 0.2541 \| 7.66 \| 360 \| 0.2951 \|
	\| 0.2421 \| 8.09 \| 380 \| 0.2914 \|
	\| 0.2275 \| 8.51 \| 400 \| 0.2887 \|
	\| 0.26 \| 8.94 \| 420 \| 0.2799 \|
	\| 0.2275 \| 9.36 \| 440 \| 0.2797 \|
	\| 0.2291 \| 9.79 \| 460 \| 0.2722 \|
	\| 0.2222 \| 10.21 \| 480 \| 0.2744 \|
	\| 0.2391 \| 10.64 \| 500 \| 0.2721 \|
	\| 0.208 \| 11.06 \| 520 \| 0.2671 \|
	\| 0.2012 \| 11.49 \| 540 \| 0.2691 \|
	\| 0.2092 \| 11.91 \| 560 \| 0.2619 \|
	\| 0.1761 \| 12.34 \| 580 \| 0.2636 \|
	\| 0.2248 \| 12.77 \| 600 \| 0.2596 \|
	\| 0.1803 \| 13.19 \| 620 \| 0.2611 \|
	\| 0.2022 \| 13.62 \| 640 \| 0.2597 \|
	\| 0.2006 \| 14.04 \| 660 \| 0.2578 \|
	\| 0.1864 \| 14.47 \| 680 \| 0.2561 \|
	\| 0.1933 \| 14.89 \| 700 \| 0.2560 \|
	\| 0.1892 \| 15.32 \| 720 \| 0.2570 \|
	\| 0.192 \| 15.74 \| 740 \| 0.2562 \|
	\| 0.1883 \| 16.17 \| 760 \| 0.2553 \|
	\| 0.1781 \| 16.6 \| 780 \| 0.2549 \|
	\| 0.1705 \| 17.02 \| 800 \| 0.2560 \|
	\| 0.181 \| 17.45 \| 820 \| 0.2566 \|
	\| 0.1552 \| 17.87 \| 840 \| 0.2551 \|
	\| 0.173 \| 18.3 \| 860 \| 0.2560 \|
	\| 0.1934 \| 18.72 \| 880 \| 0.2557 \|
	\| 0.1754 \| 19.15 \| 900 \| 0.2555 \|
	\| 0.1796 \| 19.57 \| 920 \| 0.2555 \|
	\| 0.1745 \| 20.0 \| 940 \| 0.2555 \|


	### Framework versions

	- PEFT 0.10.0
	- Transformers 4.39.3
	- Pytorch 2.1.2
	- Datasets 2.18.0
	- Tokenizers 0.15.2