hemlang
/

Hemlock-Coder-7B

Text Generation

text-generation-inference

Model card Files Files and versions

Hemlock-Coder-7B / README.md

nbeerbower's picture

Update README.md

15bf037 verified 11 days ago

|

history blame contribute delete

1.14 kB

	---
	library_name: transformers
	tags:
	- merlina
	- text-generation
	- sft
	datasets:
	- hemlang/Hemlock-SFT
	base_model:
	- nbeerbower/Hemlock-Qwen2.5-Coder-7B
	---
	![image/png](https://huggingface.co/datasets/nbeerbower/cover-images/resolve/main/hemlock_kawaii.png)

	# Hemlock-Coder-7B

	## Training Configuration

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Training Mode \| SFT \|
	\| Base Model \| `nbeerbower/Hemlock-Qwen2.5-Coder-7B` \|
	\| Learning Rate \| 0.0001 \|
	\| Epochs \| 2 \|
	\| Batch Size \| 1 \|
	\| Gradient Accumulation \| 16 \|
	\| Effective Batch Size \| 16 \|
	\| Max Sequence Length \| 2048 \|
	\| Optimizer \| paged_adamw_8bit \|
	\| LR Scheduler \| cosine \|
	\| Warmup Ratio \| 0.05 \|
	\| Weight Decay \| 0.01 \|
	\| Max Grad Norm \| 0.25 \|
	\| Seed \| 42 \|
	\| LoRA Rank (r) \| 128 \|
	\| LoRA Alpha \| 128 \|
	\| LoRA Dropout \| 0.05 \|
	\| Target Modules \| up_proj, down_proj, gate_proj, k_proj, q_proj, v_proj, o_proj \|
	\| Quantization \| 4-bit (NF4) \|
	\| GPU \| NVIDIA RTX A6000 \|

	---

	![Trained with Merlina](https://raw.githubusercontent.com/Schneewolf-Labs/Merlina/refs/heads/main/frontend/madewithmerlina_smol.png)

	[Merlina on GitHub](https://github.com/Schneewolf-Labs/Merlina)