nbeerbower commited on
Commit
598cb28
·
verified ·
1 Parent(s): 4661452

Add model card with training configuration

Browse files
Files changed (1) hide show
  1. README.md +56 -0
README.md ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ pipeline_tag: text-generation
4
+ tags:
5
+ - merlina
6
+ - grimoire
7
+ - text-generation
8
+ - sft
9
+ datasets:
10
+ - Crownelius/Opus-4.6-Reasoning-3300x
11
+ - schneewolflabs/Luna-DPO
12
+ - schneewolflabs/i-SFT
13
+ base_model:
14
+ - schneewolflabs/A1
15
+ ---
16
+
17
+ # A1.1
18
+
19
+ ## Training Configuration
20
+
21
+ | Parameter | Value |
22
+ |-----------|-------|
23
+ | Training Mode | SFT |
24
+ | Base Model | `schneewolflabs/A1` |
25
+ | Learning Rate | 2e-05 |
26
+ | Epochs | 1 |
27
+ | Batch Size | 2 |
28
+ | Gradient Accumulation | 16 |
29
+ | Effective Batch Size | 32 |
30
+ | Max Sequence Length | 8192 |
31
+ | Optimizer | paged_adamw_8bit |
32
+ | LR Scheduler | cosine |
33
+ | Warmup Ratio | 0.05 |
34
+ | Weight Decay | 0.01 |
35
+ | Max Grad Norm | 0.5 |
36
+ | Seed | 42 |
37
+ | LoRA Rank (r) | 128 |
38
+ | LoRA Alpha | 256 |
39
+ | LoRA Dropout | 0.05 |
40
+ | Target Modules | up_proj, down_proj, gate_proj, k_proj, q_proj, v_proj, o_proj |
41
+ | Quantization | 4-bit (NF4) |
42
+ | GPU | NVIDIA GB10 |
43
+
44
+ ## Datasets
45
+
46
+ Trained on 3 concatenated datasets:
47
+
48
+ 1. [`Crownelius/Opus-4.6-Reasoning-3300x`](https://huggingface.co/datasets/Crownelius/Opus-4.6-Reasoning-3300x) (split: `train`)
49
+ 2. [`schneewolflabs/Luna-DPO`](https://huggingface.co/datasets/schneewolflabs/Luna-DPO) (split: `train`)
50
+ 3. [`schneewolflabs/i-SFT`](https://huggingface.co/datasets/schneewolflabs/i-SFT) (split: `train`)
51
+
52
+ ---
53
+
54
+ ![Trained with Merlina](https://raw.githubusercontent.com/Schneewolf-Labs/Merlina/refs/heads/main/frontend/madewithmerlina_smol.png)
55
+
56
+ [Merlina on GitHub](https://github.com/Schneewolf-Labs/Merlina)