kosmylo1992 commited on
Commit
d0620d4
·
verified ·
1 Parent(s): 7d68235

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +106 -3
README.md CHANGED
@@ -1,3 +1,106 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ {
3
+ "language": ["en"],
4
+ "license": "apache-2.0",
5
+ "tags": [
6
+ "text-generation",
7
+ "causal-lm",
8
+ "instruction-tuning",
9
+ "supervised-fine-tuning",
10
+ "synthetic-qa",
11
+ "lora",
12
+ "axolotl",
13
+ "deepspeed",
14
+ "transformers",
15
+ "mistral",
16
+ "nemo",
17
+ "eu-hpc"
18
+ ],
19
+ "datasets": ["axolotl_deduplicated_synthetic_qa"],
20
+ "metrics": ["loss"],
21
+ "library_name": "transformers",
22
+ "framework": "pytorch",
23
+ "base_model": "mistralai/Mistral-Nemo-Instruct-2407",
24
+ "model_name": "mistral-12b-sft",
25
+ "pipeline_tag": "text-generation",
26
+ "task_categories": ["text-generation", "instruction-following"],
27
+ "model_type": "AutoModelForCausalLM",
28
+ "inference": {
29
+ "parameters": {
30
+ "max_new_tokens": 512,
31
+ "temperature": 0.7,
32
+ "top_p": 0.9
33
+ }
34
+ },
35
+ "trained_on": ["Leonardo EuroHPC"],
36
+ "description": "Supervised fine-tuning (SFT) of Mistral 12B Nemo Instruct on synthetic QA data using LoRA with Axolotl and DeepSpeed. Improves conversational reasoning and factual accuracy."
37
+ }
38
+
39
+ ---
40
+
41
+ # Mistral 12B — SFT (Supervised Fine-Tuning on Synthetic QA)
42
+
43
+ **Model type:** Causal Language Model
44
+ **Base model:** [mistralai/Mistral-Nemo-Instruct-2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407)
45
+ **License:** Apache 2.0
46
+ **Framework:** [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl)
47
+
48
+ ---
49
+
50
+ ## Overview
51
+
52
+ `mistral-12b-sft` is a **supervised fine-tuned** variant of Mistral-12B trained on high-quality synthetic QA data.
53
+ This SFT phase enhances instruction following, factual reasoning, and conversational ability while maintaining model efficiency via 8-bit LoRA adapters.
54
+
55
+ Training was conducted on **Leonardo EuroHPC**.
56
+
57
+ ---
58
+
59
+ ## Training Setup
60
+
61
+ **Objective:** Supervised fine-tuning (instruction-following QA)
62
+ **Adapter:** LoRA + 8-bit base
63
+ **Precision:** bfloat16
64
+ **Hardware:** 8 × 2 × A100 64 GB
65
+ **Framework:** Axolotl + DeepSpeed + PyTorch 2.5.1 + CUDA 12.1
66
+ **Runtime:** ~6 h
67
+ **Validation:** 30 %
68
+
69
+ ---
70
+
71
+ ## Dataset
72
+
73
+ | Dataset | Type | Description |
74
+ |----------|------|-------------|
75
+ | `axolotl_deduplicated_synthetic_qa.jsonl` | `alpaca_chat.load_qa` | Synthetic instruction–response pairs for QA and chat fine-tuning |
76
+
77
+ ---
78
+
79
+ ## Hyperparameters
80
+
81
+ | Parameter | Value |
82
+ |------------|-------|
83
+ | Sequence length | 2048 |
84
+ | Micro batch size | 2 |
85
+ | Gradient accumulation | 2 |
86
+ | Epochs | 1 |
87
+ | Learning rate | 0.0002 |
88
+ | LR scheduler | cosine |
89
+ | Optimizer | AdamW (8-bit) |
90
+ | Warmup steps | 10 |
91
+ | Weight decay | 0.0 |
92
+ | LoRA rank (r) | 16 |
93
+ | LoRA alpha | 32 |
94
+ | LoRA dropout | 0.05 |
95
+ | LoRA targets | q_proj, k_proj, v_proj, o_proj |
96
+ | Gradient checkpointing | ✅ |
97
+ | Flash attention | ✅ |
98
+ | Auto-resume | ✅ |
99
+ | Loss watchdog | threshold 5.0, patience 3 |
100
+
101
+ ---
102
+
103
+ ## Tokenizer
104
+
105
+ **Tokenizer type:** `AutoTokenizer`
106
+ **Pad token:** `<|end_of_text|>`