nbeerbower
/

Hemlock-Coder-3B

Text Generation

text-generation-inference

Model card Files Files and versions

Hemlock-Coder-3B / README.md

nbeerbower's picture

Add model card with training configuration

7603c8c verified 13 days ago

|

history blame contribute delete

1.04 kB

	---
	library_name: transformers
	tags:
	- merlina
	- text-generation
	- sft
	datasets:
	- hemlang/Hemlock-SFT
	base_model:
	- nbeerbower/Hemlock-Qwen2.5-Coder-3B
	---

	# Hemlock-Coder-3B

	## Training Configuration

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Training Mode \| SFT \|
	\| Base Model \| `nbeerbower/Hemlock-Qwen2.5-Coder-3B` \|
	\| Learning Rate \| 0.0001 \|
	\| Epochs \| 2 \|
	\| Batch Size \| 1 \|
	\| Gradient Accumulation \| 16 \|
	\| Effective Batch Size \| 16 \|
	\| Max Sequence Length \| 2048 \|
	\| Optimizer \| paged_adamw_8bit \|
	\| LR Scheduler \| cosine \|
	\| Warmup Ratio \| 0.05 \|
	\| Weight Decay \| 0.01 \|
	\| Max Grad Norm \| 0.25 \|
	\| Seed \| 42 \|
	\| LoRA Rank (r) \| 128 \|
	\| LoRA Alpha \| 128 \|
	\| LoRA Dropout \| 0.05 \|
	\| Target Modules \| up_proj, down_proj, gate_proj, k_proj, q_proj, v_proj, o_proj \|
	\| Quantization \| 4-bit (NF4) \|
	\| GPU \| NVIDIA RTX A6000 \|

	---

	![Trained with Merlina](https://raw.githubusercontent.com/Schneewolf-Labs/Merlina/refs/heads/main/frontend/madewithmerlina_smol.png)

	[Merlina on GitHub](https://github.com/Schneewolf-Labs/Merlina)