REILX
/

Phi-3-medium-128k-code-instruct

Text Generation

Model card Files Files and versions

Phi-3-medium-128k-code-instruct / README.md

REILX's picture

Update README.md

1457932 verified over 1 year ago

|

1.3 kB

	---
	license: mit
	datasets:
	- Replete-AI/code_bagel
	language:
	- en
	tags:
	- code
	---

	### Base_model
	microsoft/Phi-3-medium-128k-instruct(https://huggingface.co/microsoft/Phi-3-medium-128k-instruct)

	### Datasets
	Replete-AI/code_bagel(https://huggingface.co/datasets/Replete-AI/code_bagel)

	### Train Loss
	![image/png](https://cdn-uploads.huggingface.co/production/uploads/636f54b95d2050767e4a6317/tOBahj5rDAJzqCmftVdkX.png)

	### Train State
	Trainable params: 27852800 \|\| all params: 13988090880 \|\| trainable%: 0.1991
	Total Training Duration：69h18m17s
	{
	"epoch": 0.9999679800589659,
	"total_flos": 1.446273483573748e+20,
	"train_loss": 0.44412665014957775,
	"train_runtime": 249497.725,
	"train_samples_per_second": 13.018,
	"train_steps_per_second": 0.102
	}

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 1
	- eval_batch_size: 8
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 8
	- gradient_accumulation_steps: 16
	- total_train_batch_size: 128
	- total_eval_batch_size: 64
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 1200
	- num_epochs: 1.0

	### I personally fine-tuned the largest dataset, which took the most time.