nbeerbower commited on
Commit
4b6ff3a
·
verified ·
1 Parent(s): 066ff3d

Add model card with training configuration

Browse files
Files changed (1) hide show
  1. README.md +46 -0
README.md ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ pipeline_tag: text-generation
4
+ tags:
5
+ - merlina
6
+ - grimoire
7
+ - text-generation
8
+ - sft
9
+ datasets:
10
+ - schneewolflabs/DenkerCode-SFT
11
+ base_model:
12
+ - schneewolflabs/A2
13
+ ---
14
+
15
+ # A2-Coder
16
+
17
+ ## Training Configuration
18
+
19
+ | Parameter | Value |
20
+ |-----------|-------|
21
+ | Training Mode | SFT |
22
+ | Base Model | `schneewolflabs/A2` |
23
+ | Learning Rate | 0.0001 |
24
+ | Epochs | 1 |
25
+ | Batch Size | 1 |
26
+ | Gradient Accumulation | 8 |
27
+ | Effective Batch Size | 8 |
28
+ | Max Sequence Length | 2048 |
29
+ | Optimizer | paged_adamw_8bit |
30
+ | LR Scheduler | cosine |
31
+ | Warmup Ratio | 0.05 |
32
+ | Weight Decay | 0.01 |
33
+ | Max Grad Norm | 0.5 |
34
+ | Seed | 42 |
35
+ | LoRA Rank (r) | 128 |
36
+ | LoRA Alpha | 128 |
37
+ | LoRA Dropout | 0.05 |
38
+ | Target Modules | up_proj, down_proj, gate_proj, k_proj, q_proj, v_proj, o_proj |
39
+ | Quantization | 4-bit (NF4) |
40
+ | GPU | NVIDIA RTX A6000 |
41
+
42
+ ---
43
+
44
+ ![Trained with Merlina](https://raw.githubusercontent.com/Schneewolf-Labs/Merlina/refs/heads/main/frontend/madewithmerlina_smol.png)
45
+
46
+ [Merlina on GitHub](https://github.com/Schneewolf-Labs/Merlina)