|
|
--- |
|
|
library_name: transformers |
|
|
tags: |
|
|
- merlina |
|
|
- text-generation |
|
|
- sft |
|
|
datasets: |
|
|
- hemlang/Hemlock-SFT |
|
|
base_model: |
|
|
- nbeerbower/Hemlock-Qwen2.5-Coder-7B |
|
|
--- |
|
|
 |
|
|
|
|
|
# Hemlock-Coder-7B |
|
|
|
|
|
## Training Configuration |
|
|
|
|
|
| Parameter | Value | |
|
|
|-----------|-------| |
|
|
| Training Mode | SFT | |
|
|
| Base Model | `nbeerbower/Hemlock-Qwen2.5-Coder-7B` | |
|
|
| Learning Rate | 0.0001 | |
|
|
| Epochs | 2 | |
|
|
| Batch Size | 1 | |
|
|
| Gradient Accumulation | 16 | |
|
|
| Effective Batch Size | 16 | |
|
|
| Max Sequence Length | 2048 | |
|
|
| Optimizer | paged_adamw_8bit | |
|
|
| LR Scheduler | cosine | |
|
|
| Warmup Ratio | 0.05 | |
|
|
| Weight Decay | 0.01 | |
|
|
| Max Grad Norm | 0.25 | |
|
|
| Seed | 42 | |
|
|
| LoRA Rank (r) | 128 | |
|
|
| LoRA Alpha | 128 | |
|
|
| LoRA Dropout | 0.05 | |
|
|
| Target Modules | up_proj, down_proj, gate_proj, k_proj, q_proj, v_proj, o_proj | |
|
|
| Quantization | 4-bit (NF4) | |
|
|
| GPU | NVIDIA RTX A6000 | |
|
|
|
|
|
--- |
|
|
|
|
|
 |
|
|
|
|
|
[Merlina on GitHub](https://github.com/Schneewolf-Labs/Merlina) |
|
|
|