nbeerbower commited on
Commit
e6de931
·
verified ·
1 Parent(s): 36d8ac6

Add model card with training configuration

Browse files
Files changed (1) hide show
  1. README.md +46 -0
README.md ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ pipeline_tag: text-generation
4
+ tags:
5
+ - merlina
6
+ - grimoire
7
+ - text-generation
8
+ - sft
9
+ datasets:
10
+ - hemlang/hemlock-codex-SFT
11
+ base_model:
12
+ - hemlang/Hemlock2-Coder-7B
13
+ ---
14
+
15
+ # Hemlock-Codex-7B
16
+
17
+ ## Training Configuration
18
+
19
+ | Parameter | Value |
20
+ |-----------|-------|
21
+ | Training Mode | SFT |
22
+ | Base Model | `hemlang/Hemlock2-Coder-7B` |
23
+ | Learning Rate | 0.0001 |
24
+ | Epochs | 3 |
25
+ | Batch Size | 2 |
26
+ | Gradient Accumulation | 16 |
27
+ | Effective Batch Size | 32 |
28
+ | Max Sequence Length | 8192 |
29
+ | Optimizer | paged_adamw_8bit |
30
+ | LR Scheduler | cosine |
31
+ | Warmup Ratio | 0.05 |
32
+ | Weight Decay | 0.01 |
33
+ | Max Grad Norm | 0.25 |
34
+ | Seed | 42 |
35
+ | LoRA Rank (r) | 128 |
36
+ | LoRA Alpha | 128 |
37
+ | LoRA Dropout | 0.05 |
38
+ | Target Modules | k_proj, o_proj, q_proj, v_proj, down_proj, gate_proj, up_proj |
39
+ | Quantization | 4-bit (NF4) |
40
+ | GPU | NVIDIA RTX A6000 |
41
+
42
+ ---
43
+
44
+ ![Trained with Merlina](https://raw.githubusercontent.com/Schneewolf-Labs/Merlina/refs/heads/main/frontend/madewithmerlina_smol.png)
45
+
46
+ [Merlina on GitHub](https://github.com/Schneewolf-Labs/Merlina)