End of training
Browse files
README.md
CHANGED
|
@@ -37,6 +37,7 @@ test_datasets:
|
|
| 37 |
- path: DannyAI/African-History-QA-Dataset
|
| 38 |
split: validation
|
| 39 |
type: alpaca_chat.load_qa
|
|
|
|
| 40 |
system_prompt: "You are a helpful AI assistant specialised in African history which gives concise answers to questions asked"
|
| 41 |
|
| 42 |
# 2. Output & Chat Configuration
|
|
@@ -64,8 +65,10 @@ bf16: true
|
|
| 64 |
fp16: false
|
| 65 |
|
| 66 |
# 6. Training Duration & Optimizer
|
| 67 |
-
|
| 68 |
-
|
|
|
|
|
|
|
| 69 |
learning_rate: 0.00002
|
| 70 |
optimizer: adamw_torch
|
| 71 |
lr_scheduler: cosine
|
|
@@ -73,10 +76,11 @@ lr_scheduler: cosine
|
|
| 73 |
# 7. Logging & Evaluation
|
| 74 |
wandb_project: phi4_african_history
|
| 75 |
wandb_name: phi4_lora_axolotl
|
|
|
|
| 76 |
eval_strategy: steps
|
| 77 |
-
eval_steps:
|
| 78 |
save_strategy: steps
|
| 79 |
-
save_steps:
|
| 80 |
logging_steps: 5
|
| 81 |
|
| 82 |
# 8. Public Hugging Face Hub Upload
|
|
@@ -92,11 +96,11 @@ hub_private_repo: false
|
|
| 92 |
|
| 93 |
This model is a fine-tuned version of [microsoft/Phi-4-mini-instruct](https://huggingface.co/microsoft/Phi-4-mini-instruct) on the DannyAI/African-History-QA-Dataset dataset.
|
| 94 |
It achieves the following results on the evaluation set:
|
| 95 |
-
- Loss:
|
| 96 |
-
- Ppl:
|
| 97 |
- Memory/max Active (gib): 14.84
|
| 98 |
- Memory/max Allocated (gib): 14.84
|
| 99 |
-
- Memory/device Reserved (gib): 31.
|
| 100 |
|
| 101 |
## Model description
|
| 102 |
|
|
@@ -123,15 +127,27 @@ The following hyperparameters were used during training:
|
|
| 123 |
- total_train_batch_size: 8
|
| 124 |
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
|
| 125 |
- lr_scheduler_type: cosine
|
| 126 |
-
- lr_scheduler_warmup_steps:
|
| 127 |
-
- training_steps:
|
| 128 |
|
| 129 |
### Training results
|
| 130 |
|
| 131 |
-
| Training Loss | Epoch
|
| 132 |
-
|
| 133 |
-
| No log | 0
|
| 134 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 135 |
|
| 136 |
|
| 137 |
### Framework versions
|
|
|
|
| 37 |
- path: DannyAI/African-History-QA-Dataset
|
| 38 |
split: validation
|
| 39 |
type: alpaca_chat.load_qa
|
| 40 |
+
# Fixed the missing quote and indentation below
|
| 41 |
system_prompt: "You are a helpful AI assistant specialised in African history which gives concise answers to questions asked"
|
| 42 |
|
| 43 |
# 2. Output & Chat Configuration
|
|
|
|
| 65 |
fp16: false
|
| 66 |
|
| 67 |
# 6. Training Duration & Optimizer
|
| 68 |
+
max_steps: 650
|
| 69 |
+
# removed
|
| 70 |
+
# num_epochs:
|
| 71 |
+
warmup_steps: 20
|
| 72 |
learning_rate: 0.00002
|
| 73 |
optimizer: adamw_torch
|
| 74 |
lr_scheduler: cosine
|
|
|
|
| 76 |
# 7. Logging & Evaluation
|
| 77 |
wandb_project: phi4_african_history
|
| 78 |
wandb_name: phi4_lora_axolotl
|
| 79 |
+
|
| 80 |
eval_strategy: steps
|
| 81 |
+
eval_steps: 50
|
| 82 |
save_strategy: steps
|
| 83 |
+
save_steps: 100
|
| 84 |
logging_steps: 5
|
| 85 |
|
| 86 |
# 8. Public Hugging Face Hub Upload
|
|
|
|
| 96 |
|
| 97 |
This model is a fine-tuned version of [microsoft/Phi-4-mini-instruct](https://huggingface.co/microsoft/Phi-4-mini-instruct) on the DannyAI/African-History-QA-Dataset dataset.
|
| 98 |
It achieves the following results on the evaluation set:
|
| 99 |
+
- Loss: 1.7479
|
| 100 |
+
- Ppl: 5.7428
|
| 101 |
- Memory/max Active (gib): 14.84
|
| 102 |
- Memory/max Allocated (gib): 14.84
|
| 103 |
+
- Memory/device Reserved (gib): 31.79
|
| 104 |
|
| 105 |
## Model description
|
| 106 |
|
|
|
|
| 127 |
- total_train_batch_size: 8
|
| 128 |
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
|
| 129 |
- lr_scheduler_type: cosine
|
| 130 |
+
- lr_scheduler_warmup_steps: 20
|
| 131 |
+
- training_steps: 650
|
| 132 |
|
| 133 |
### Training results
|
| 134 |
|
| 135 |
+
| Training Loss | Epoch | Step | Validation Loss | Ppl | Active (gib) | Allocated (gib) | Reserved (gib) |
|
| 136 |
+
|:-------------:|:-------:|:----:|:---------------:|:------:|:------------:|:---------------:|:--------------:|
|
| 137 |
+
| No log | 0 | 0 | 2.1184 | 8.3175 | 14.82 | 14.82 | 15.37 |
|
| 138 |
+
| 5.394 | 3.8627 | 50 | 2.1004 | 8.1694 | 14.84 | 14.84 | 31.82 |
|
| 139 |
+
| 4.4484 | 7.7059 | 100 | 2.0367 | 7.6652 | 14.84 | 14.84 | 31.84 |
|
| 140 |
+
| 3.7583 | 11.5490 | 150 | 1.9785 | 7.2316 | 14.84 | 14.84 | 31.84 |
|
| 141 |
+
| 3.363 | 15.3922 | 200 | 1.9299 | 6.8886 | 14.84 | 14.84 | 31.84 |
|
| 142 |
+
| 3.0568 | 19.2353 | 250 | 1.8664 | 6.4652 | 14.84 | 14.84 | 31.84 |
|
| 143 |
+
| 2.8736 | 23.0784 | 300 | 1.8134 | 6.1314 | 14.84 | 14.84 | 31.79 |
|
| 144 |
+
| 2.7646 | 26.9412 | 350 | 1.7851 | 5.9604 | 14.84 | 14.84 | 31.79 |
|
| 145 |
+
| 2.6891 | 30.7843 | 400 | 1.7668 | 5.8523 | 14.84 | 14.84 | 31.79 |
|
| 146 |
+
| 2.6843 | 34.6275 | 450 | 1.7581 | 5.8014 | 14.84 | 14.84 | 31.79 |
|
| 147 |
+
| 2.6048 | 38.4706 | 500 | 1.7534 | 5.7739 | 14.84 | 14.84 | 31.79 |
|
| 148 |
+
| 2.6118 | 42.3137 | 550 | 1.7505 | 5.7573 | 14.84 | 14.84 | 31.79 |
|
| 149 |
+
| 2.6024 | 46.1569 | 600 | 1.7503 | 5.7565 | 14.84 | 14.84 | 31.79 |
|
| 150 |
+
| 2.5727 | 50.0 | 650 | 1.7479 | 5.7428 | 14.84 | 14.84 | 31.79 |
|
| 151 |
|
| 152 |
|
| 153 |
### Framework versions
|
debug.log
CHANGED
|
@@ -1846,4 +1846,49 @@ trainable params: 1,572,864 || all params: 3,837,594,624 || trainable%: 0.0410
|
|
| 1846 |
|
| 1847 |
|
| 1848 |
[A[2026-01-24 15:07:25,861] [INFO] [axolotl.core.trainers.base._save:721] [PID:9359] Saving model checkpoint to ./phi4_african_history_lora_out/checkpoint-650
|
| 1849 |
|
| 1850 |
|
| 1851 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1852 |
...ora_out/training_args.bin: 100%|βββββββββββββββββββββββββββββββββ| 7.76kB / 7.76kB [A[A
|
|
|
|
|
|
|
|
|
|
| 1853 |
...adapter_model.safetensors: 100%|βββββββββββββββββββββββββββββββββ| 6.30MB / 6.30MB [A[A[A
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1854 |
...y_lora_out/tokenizer.json: 100%|βββββββββββββββββββββββββββββββββ| 15.5MB / 15.5MB [A[A[A[A
|
|
|
|
|
|
|
| 1855 |
...ora_out/training_args.bin: 100%|βββββββββββββββββββββββββββββββββ| 7.76kB / 7.76kB [A[A
|
|
|
|
|
|
|
|
|
|
| 1856 |
...adapter_model.safetensors: 100%|βββββββββββββββββββββββββββββββββ| 6.30MB / 6.30MB [A[A[A
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1857 |
...y_lora_out/tokenizer.json: 100%|βββββββββββββββββββββββββββββββββ| 15.5MB / 15.5MB [A[A[A[A
|
|
|
|
|
|
|
| 1858 |
...ora_out/training_args.bin: 100%|βββββββββββββββββββββββββββββββββ| 7.76kB / 7.76kB [A[A
|
|
|
|
|
|
|
|
|
|
| 1859 |
...adapter_model.safetensors: 100%|βββββββββββββββββββββββββββββββββ| 6.30MB / 6.30MB [A[A[A
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1860 |
...y_lora_out/tokenizer.json: 100%|βββββββββββββββββββββββββββββββββ| 15.5MB / 15.5MB [A[A[A[A
|
|
|
|
|
|
|
| 1861 |
...ora_out/training_args.bin: 100%|βββββββββββββββββββββββββββββββββ| 7.76kB / 7.76kB [A[A
|
|
|
|
|
|
|
|
|
|
| 1862 |
...adapter_model.safetensors: 100%|βββββββββββββββββββββββββββββββββ| 6.30MB / 6.30MB [A[A[A
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1863 |
...y_lora_out/tokenizer.json: 100%|βββββββββββββββββββββββββββββββββ| 15.5MB / 15.5MB [A[A[A[A
|
|
|
|
|
|
|
| 1864 |
...ora_out/training_args.bin: 100%|βββββββββββββββββββββββββββββββββ| 7.76kB / 7.76kB
|
|
|
|
| 1865 |
...adapter_model.safetensors: 100%|βββββββββββββββββββββββββββββββββ| 6.30MB / 6.30MB
|
|
|
|
| 1866 |
...y_lora_out/tokenizer.json: 100%|βββββββββββββββββββββββββββββββββ| 15.5MB / 15.5MB
|
|
|
|
| 1846 |
|
| 1847 |
|
| 1848 |
[A[2026-01-24 15:07:25,861] [INFO] [axolotl.core.trainers.base._save:721] [PID:9359] Saving model checkpoint to ./phi4_african_history_lora_out/checkpoint-650
|
| 1849 |
|
| 1850 |
|
| 1851 |
+
|
| 1852 |
+
[2026-01-24 15:07:34,109] [INFO] [axolotl.train.save_trained_model:233] [PID:9359] Training completed! Saving trained model to ./phi4_african_history_lora_out.
|
| 1853 |
+
[2026-01-24 15:07:34,468] [INFO] [axolotl.train.save_trained_model:351] [PID:9359] Model successfully saved to ./phi4_african_history_lora_out
|
| 1854 |
+
[2026-01-24 15:07:34,702] [INFO] [axolotl.core.trainers.base._save:721] [PID:9359] Saving model checkpoint to ./phi4_african_history_lora_out
|
| 1855 |
+
|
| 1856 |
+
|
| 1857 |
+
|
| 1858 |
+
|
| 1859 |
...ora_out/training_args.bin: 100%|βββββββββββββββββββββββββββββββββ| 7.76kB / 7.76kB [A[A
|
| 1860 |
+
|
| 1861 |
+
|
| 1862 |
+
|
| 1863 |
...adapter_model.safetensors: 100%|βββββββββββββββββββββββββββββββββ| 6.30MB / 6.30MB [A[A[A
|
| 1864 |
+
|
| 1865 |
+
|
| 1866 |
+
|
| 1867 |
+
|
| 1868 |
...y_lora_out/tokenizer.json: 100%|βββββββββββββββββββββββββββββββββ| 15.5MB / 15.5MB [A[A[A[A
|
| 1869 |
+
|
| 1870 |
+
|
| 1871 |
...ora_out/training_args.bin: 100%|βββββββββββββββββββββββββββββββββ| 7.76kB / 7.76kB [A[A
|
| 1872 |
+
|
| 1873 |
+
|
| 1874 |
+
|
| 1875 |
...adapter_model.safetensors: 100%|βββββββββββββββββββββββββββββββββ| 6.30MB / 6.30MB [A[A[A
|
| 1876 |
+
|
| 1877 |
+
|
| 1878 |
+
|
| 1879 |
+
|
| 1880 |
...y_lora_out/tokenizer.json: 100%|βββββββββββββββββββββββββββββββββ| 15.5MB / 15.5MB [A[A[A[A
|
| 1881 |
+
|
| 1882 |
+
|
| 1883 |
...ora_out/training_args.bin: 100%|βββββββββββββββββββββββββββββββββ| 7.76kB / 7.76kB [A[A
|
| 1884 |
+
|
| 1885 |
+
|
| 1886 |
+
|
| 1887 |
...adapter_model.safetensors: 100%|βββββββββββββββββββββββββββββββββ| 6.30MB / 6.30MB [A[A[A
|
| 1888 |
+
|
| 1889 |
+
|
| 1890 |
+
|
| 1891 |
+
|
| 1892 |
...y_lora_out/tokenizer.json: 100%|βββββββββββββββββββββββββββββββββ| 15.5MB / 15.5MB [A[A[A[A
|
| 1893 |
+
|
| 1894 |
+
|
| 1895 |
...ora_out/training_args.bin: 100%|βββββββββββββββββββββββββββββββββ| 7.76kB / 7.76kB [A[A
|
| 1896 |
+
|
| 1897 |
+
|
| 1898 |
+
|
| 1899 |
...adapter_model.safetensors: 100%|βββββββββββββββββββββββββββββββββ| 6.30MB / 6.30MB [A[A[A
|
| 1900 |
+
|
| 1901 |
+
|
| 1902 |
+
|
| 1903 |
+
|
| 1904 |
...y_lora_out/tokenizer.json: 100%|βββββββββββββββββββββββββββββββββ| 15.5MB / 15.5MB [A[A[A[A
|
| 1905 |
+
|
| 1906 |
+
|
| 1907 |
...ora_out/training_args.bin: 100%|βββββββββββββββββββββββββββββββββ| 7.76kB / 7.76kB
|
| 1908 |
+
|
| 1909 |
...adapter_model.safetensors: 100%|βββββββββββββββββββββββββββββββββ| 6.30MB / 6.30MB
|
| 1910 |
+
|
| 1911 |
...y_lora_out/tokenizer.json: 100%|βββββββββββββββββββββββββββββββββ| 15.5MB / 15.5MB
|