DannyAI commited on
Commit
3ac84e6
Β·
verified Β·
1 Parent(s): 7bde204

End of training

Browse files
Files changed (2) hide show
  1. README.md +143 -0
  2. debug.log +46 -1
README.md ADDED
@@ -0,0 +1,143 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: peft
3
+ license: mit
4
+ base_model: microsoft/Phi-4-mini-instruct
5
+ tags:
6
+ - axolotl
7
+ - base_model:adapter:microsoft/Phi-4-mini-instruct
8
+ - lora
9
+ - transformers
10
+ datasets:
11
+ - DannyAI/African-History-QA-Dataset
12
+ pipeline_tag: text-generation
13
+ model-index:
14
+ - name: phi4_lora_axolotl
15
+ results: []
16
+ ---
17
+
18
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
19
+ should probably proofread and complete it, then remove this comment. -->
20
+
21
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
22
+ <details><summary>See axolotl config</summary>
23
+
24
+ axolotl version: `0.14.0.dev0`
25
+ ```yaml
26
+ base_model: microsoft/Phi-4-mini-instruct
27
+ model_type: AutoModelForCausalLM
28
+ tokenizer_type: AutoTokenizer
29
+
30
+ # 1. Dataset Configuration
31
+ datasets:
32
+ - path: DannyAI/African-History-QA-Dataset
33
+ split: train
34
+ type: alpaca_chat.load_qa
35
+ system_prompt: "You are a helpful AI assistant specialised in African history which gives concise answers to questions asked"
36
+ test_datasets:
37
+ - path: DannyAI/African-History-QA-Dataset
38
+ split: validation
39
+ type: alpaca_chat.load_qa
40
+ system_prompt: "You are a helpful AI assistant specialised in African history which gives concise answers to questions asked"
41
+
42
+ # 2. Output & Chat Configuration
43
+ output_dir: ./phi4_african_history_lora_out
44
+ chat_template: tokenizer_default
45
+ train_on_inputs: false
46
+
47
+ # 3. Batch Size Configuration
48
+ micro_batch_size: 2
49
+ gradient_accumulation_steps: 4
50
+
51
+ # 4. LoRA Configuration
52
+ adapter: lora
53
+ lora_r: 8
54
+ lora_alpha: 16
55
+ lora_dropout: 0.05
56
+ lora_target_modules: [q_proj, v_proj, k_proj, o_proj]
57
+
58
+ # 5. Hardware & Efficiency
59
+ sequence_len: 2048
60
+ sample_packing: true
61
+ eval_sample_packing: false
62
+ pad_to_sequence_len: true
63
+ bf16: true
64
+ fp16: false
65
+
66
+ # 6. Training Duration & Optimizer
67
+ num_epochs: 10
68
+ warmup_steps: 10
69
+ learning_rate: 0.00002
70
+ optimizer: adamw_torch
71
+ lr_scheduler: cosine
72
+
73
+ # 7. Logging & Evaluation
74
+ wandb_project: phi4_african_history
75
+ wandb_name: phi4_lora_axolotl
76
+ eval_strategy: steps
77
+ eval_steps: 100
78
+ save_strategy: steps
79
+ save_steps: 200
80
+ logging_steps: 5
81
+
82
+ # 8. Public Hugging Face Hub Upload
83
+ hub_model_id: DannyAI/phi4_lora_axolotl
84
+ push_adapter_to_hub: true
85
+ hub_private_repo: false
86
+
87
+ ```
88
+
89
+ </details><br>
90
+
91
+ # phi4_lora_axolotl
92
+
93
+ This model is a fine-tuned version of [microsoft/Phi-4-mini-instruct](https://huggingface.co/microsoft/Phi-4-mini-instruct) on the DannyAI/African-History-QA-Dataset dataset.
94
+ It achieves the following results on the evaluation set:
95
+ - Loss: 2.0938
96
+ - Ppl: 8.1156
97
+ - Memory/max Active (gib): 14.84
98
+ - Memory/max Allocated (gib): 14.84
99
+ - Memory/device Reserved (gib): 31.82
100
+
101
+ ## Model description
102
+
103
+ More information needed
104
+
105
+ ## Intended uses & limitations
106
+
107
+ More information needed
108
+
109
+ ## Training and evaluation data
110
+
111
+ More information needed
112
+
113
+ ## Training procedure
114
+
115
+ ### Training hyperparameters
116
+
117
+ The following hyperparameters were used during training:
118
+ - learning_rate: 2e-05
119
+ - train_batch_size: 2
120
+ - eval_batch_size: 2
121
+ - seed: 42
122
+ - gradient_accumulation_steps: 4
123
+ - total_train_batch_size: 8
124
+ - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
125
+ - lr_scheduler_type: cosine
126
+ - lr_scheduler_warmup_steps: 10
127
+ - training_steps: 120
128
+
129
+ ### Training results
130
+
131
+ | Training Loss | Epoch | Step | Validation Loss | Ppl | Active (gib) | Allocated (gib) | Reserved (gib) |
132
+ |:-------------:|:------:|:----:|:---------------:|:------:|:------------:|:---------------:|:--------------:|
133
+ | No log | 0 | 0 | 2.1335 | 8.4442 | 14.82 | 14.82 | 15.37 |
134
+ | 4.9679 | 7.7059 | 100 | 2.0938 | 8.1156 | 14.84 | 14.84 | 31.82 |
135
+
136
+
137
+ ### Framework versions
138
+
139
+ - PEFT 0.18.1
140
+ - Transformers 4.57.6
141
+ - Pytorch 2.9.1+cu128
142
+ - Datasets 4.5.0
143
+ - Tokenizers 0.22.2
debug.log CHANGED
@@ -473,4 +473,49 @@ trainable params: 1,572,864 || all params: 3,837,594,624 || trainable%: 0.0410
473
 
474
  96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 115/120 [17:35<00:41, 8.38s/it]
475
  97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 116/120 [17:43<00:33, 8.34s/it]
476
  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 117/120 [17:49<00:23, 7.72s/it]
477
  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 118/120 [17:59<00:16, 8.47s/it]
478
  99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 119/120 [18:08<00:08, 8.41s/it]
479
 
480
 
481
 
482
 
483
-
 
 
 
 
 
 
 
484
  ...ora_out/training_args.bin: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7.76kB / 7.76kB 
 
 
 
485
  ...adapter_model.safetensors: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 6.30MB / 6.30MB 
 
 
 
 
486
  ...y_lora_out/tokenizer.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 15.5MB / 15.5MB 
 
 
487
  ...ora_out/training_args.bin: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7.76kB / 7.76kB 
 
 
 
488
  ...adapter_model.safetensors: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 6.30MB / 6.30MB 
 
 
 
 
489
  ...y_lora_out/tokenizer.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 15.5MB / 15.5MB 
 
 
490
  ...ora_out/training_args.bin: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7.76kB / 7.76kB 
 
 
 
491
  ...adapter_model.safetensors: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 6.30MB / 6.30MB 
 
 
 
 
492
  ...y_lora_out/tokenizer.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 15.5MB / 15.5MB 
 
 
493
  ...ora_out/training_args.bin: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7.76kB / 7.76kB 
 
 
 
494
  ...adapter_model.safetensors: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 6.30MB / 6.30MB 
 
 
 
 
495
  ...y_lora_out/tokenizer.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 15.5MB / 15.5MB 
 
 
496
  ...ora_out/training_args.bin: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7.76kB / 7.76kB
 
497
  ...adapter_model.safetensors: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 6.30MB / 6.30MB
 
498
  ...y_lora_out/tokenizer.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 15.5MB / 15.5MB
 
473
 
474
  96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 115/120 [17:35<00:41, 8.38s/it]
475
  97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 116/120 [17:43<00:33, 8.34s/it]
476
  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 117/120 [17:49<00:23, 7.72s/it]
477
  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 118/120 [17:59<00:16, 8.47s/it]
478
  99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 119/120 [18:08<00:08, 8.41s/it]
479
 
480
 
481
 
482
 
483
+
484
+ [2026-01-24 13:11:17,545] [INFO] [axolotl.train.save_trained_model:233] [PID:6937] Training completed! Saving trained model to ./phi4_african_history_lora_out.
485
+ [2026-01-24 13:11:17,908] [INFO] [axolotl.train.save_trained_model:351] [PID:6937] Model successfully saved to ./phi4_african_history_lora_out
486
+ [2026-01-24 13:11:18,131] [INFO] [axolotl.core.trainers.base._save:721] [PID:6937] Saving model checkpoint to ./phi4_african_history_lora_out
487
+
488
+
489
+
490
+
491
  ...ora_out/training_args.bin: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7.76kB / 7.76kB 
492
+
493
+
494
+
495
  ...adapter_model.safetensors: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 6.30MB / 6.30MB 
496
+
497
+
498
+
499
+
500
  ...y_lora_out/tokenizer.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 15.5MB / 15.5MB 
501
+
502
+
503
  ...ora_out/training_args.bin: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7.76kB / 7.76kB 
504
+
505
+
506
+
507
  ...adapter_model.safetensors: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 6.30MB / 6.30MB 
508
+
509
+
510
+
511
+
512
  ...y_lora_out/tokenizer.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 15.5MB / 15.5MB 
513
+
514
+
515
  ...ora_out/training_args.bin: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7.76kB / 7.76kB 
516
+
517
+
518
+
519
  ...adapter_model.safetensors: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 6.30MB / 6.30MB 
520
+
521
+
522
+
523
+
524
  ...y_lora_out/tokenizer.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 15.5MB / 15.5MB 
525
+
526
+
527
  ...ora_out/training_args.bin: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7.76kB / 7.76kB 
528
+
529
+
530
+
531
  ...adapter_model.safetensors: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 6.30MB / 6.30MB 
532
+
533
+
534
+
535
+
536
  ...y_lora_out/tokenizer.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 15.5MB / 15.5MB 
537
+
538
+
539
  ...ora_out/training_args.bin: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7.76kB / 7.76kB
540
+
541
  ...adapter_model.safetensors: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 6.30MB / 6.30MB
542
+
543
  ...y_lora_out/tokenizer.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 15.5MB / 15.5MB