Instructions to use drlee1/HanForge-47M-SFT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use drlee1/HanForge-47M-SFT with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="drlee1/HanForge-47M-SFT", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("drlee1/HanForge-47M-SFT", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use drlee1/HanForge-47M-SFT with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "drlee1/HanForge-47M-SFT" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "drlee1/HanForge-47M-SFT", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/drlee1/HanForge-47M-SFT
- SGLang
How to use drlee1/HanForge-47M-SFT with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "drlee1/HanForge-47M-SFT" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "drlee1/HanForge-47M-SFT", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "drlee1/HanForge-47M-SFT" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "drlee1/HanForge-47M-SFT", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use drlee1/HanForge-47M-SFT with Docker Model Runner:
docker model run hf.co/drlee1/HanForge-47M-SFT
# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("drlee1/HanForge-47M-SFT", trust_remote_code=True, dtype="auto")HanForge 47M SFT โ Korean Conversational Model
A Korean chat model fine-tuned from drlee1/HanForge-base with knowledge distillation on 24,693 Korean question-answer pairs spanning five everyday domains.
The model produces longer, more naturally phrased Korean responses than a templated baseline, but comes with reduced reliability under greedy decoding โ sampled decoding is recommended.
Highlights
- Longer, more natural Korean responses โ averaging 130 characters (2โ3 sentences)
- Five everyday domains: greetings & conversation, food & cooking, Korean culture & geography, health & habits, emotional support
- Pure Korean output โ 100% Hangul ratio, zero foreign-script leakage
- Compact โ 47M parameters
Intended Use
Suitable for:
- Korean chat applications within everyday-conversation domains, where natural-sounding replies matter
- Resource-constrained deployments needing a small Korean model
- Research into small-LM knowledge distillation and instruction tuning
Not suitable for:
- Factual question answering requiring high accuracy (the synthetic data is not fact-checked)
- Multi-step reasoning, coding, or technical tasks
- Open-domain conversation outside the five training domains
- Any safety-critical application
How to Use
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "drlee1/hanforge-47M-SFT"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True).eval()
USER, ASSISTANT = "<|user|>", "<|assistant|>"
def chat(prompt: str, max_new_tokens: int = 200, seed: int = 42) -> str:
torch.manual_seed(seed)
text = f"{USER}\n{prompt}\n{ASSISTANT}\n"
inputs = tokenizer(text, return_tensors="pt", add_special_tokens=False)
# Add BOS manually
bos = inputs["input_ids"].new_full((1, 1), tokenizer.bos_token_id)
inputs["input_ids"] = torch.cat([bos, inputs["input_ids"]], dim=1)
inputs["attention_mask"] = torch.cat(
[inputs["attention_mask"].new_ones((1, 1)), inputs["attention_mask"]], dim=1
)
out = model.generate(
**inputs,
max_new_tokens=max_new_tokens,
do_sample=True, # Sampled decoding is recommended
temperature=0.8,
top_p=0.9,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id,
)
return tokenizer.decode(out[0, inputs["input_ids"].size(1):], skip_special_tokens=True).strip()
print(chat("ํ๊ตญ์์ ๊ฐ ๋ณผ ๋งํ ์ฌํ์ง๋ฅผ ์ถ์ฒํด ์ฃผ์ธ์."))
Decoding tips
- Use sampling, not greedy. Greedy decoding is prone to repetition with this model. Recommended settings:
temperature=0.8,top_p=0.9. - Try multiple seeds. Some prompts produce a noticeably better answer on the second or third sampled generation.
- Cap output length. 150โ200 new tokens is usually enough; longer generations rarely improve quality.
Training Data
Fine-tuned on 24,693 Korean question-answer pairs prepared through a knowledge-distillation approach. The dataset spans 200 (domain, topic) pairs across five everyday domains, with each pair contributing roughly 100 diverse user-style questions paired with concise polite Korean answers.
The five training domains are:
| Domain | Topics covered |
|---|---|
| Daily greetings & conversation | greetings, thanks, apologies, introductions, mood, comfort, requests |
| Food & cooking basics | Korean dishes, ingredients, simple recipes, recommendations |
| Korean culture & geography | cities, mountains, traditional clothing, holidays, traditions |
| Health & lifestyle habits | exercise, sleep, nutrition, stress, daily routines |
| Emotions & empathy | sadness, loneliness, anxiety, joy, gratitude, comfort |
After filtering for polite-ending and language-purity constraints (about 8.5% drop rate), the final training set carries 100% Hangul ratio, a consistent polite voice, and an average response length of ~134 characters.
Training Procedure
Fine-tuned on top of drlee1/HanForge-base using full-parameter SFT with response-only loss masking.
| Training samples | 24,693 |
| Epochs | 5 |
| Effective batch size | 16 |
| Learning rate | 5e-5 (cosine, 3% warmup) |
| Sequence length | 384 |
| Precision | bf16 mixed |
| Final training loss | 10.4 |
| Validation perplexity | ~25 |
| Wall-clock time | ~19 minutes (Mac MPS) |
Evaluation
Evaluated on 20 prompts (14 in-distribution, 6 out-of-distribution) under both greedy and sampled decoding.
| Metric (sampled, t=0.8) | Result |
|---|---|
| Korean character ratio | 100% |
| Foreign-script leakage | 0% |
| End-of-sequence within 128 tokens | 90% |
| Average response length | ~120 chars |
| Metric (greedy) | Result |
|---|---|
| Korean character ratio | 100% |
| Foreign-script leakage | 0% |
| End-of-sequence within 128 tokens | 55% |
| Maximum repeated-token run | up to ~200 (collapse risk) |
The model is reliable on in-distribution Korean conversation but not on out-of-distribution topics. For abstract or domain-specific questions, responses are often well-formed Korean but semantically off.
Limitations and Bias
- Distilled-data origin: Training answers were prepared via knowledge distillation. Facts, recommendations, and explanations may be incorrect, stale, or biased โ do not rely on the model for accurate information.
- Domain restriction: The five training domains define the model's reliable scope. Out-of-domain prompts produce responses that may look fluent but are often off-topic.
- Greedy decoding instability: Small-scale models trained on longer responses tend to fall into repetition under greedy decoding. This model is no exception โ always use sampling.
- No alignment / safety tuning: Not RLHF'd, no harmful-content filtering. Inputs designed to elicit unsafe content may produce unsafe Korean text.
- Distillation bias: Any biases present in the distillation source are inherited by the model.
License
Released under the Apache License 2.0.
Citation
@misc{hanforge_47m_sft_2026,
author = {DongRyeol Lee},
title = {HanForge 47M SFT: A Korean Conversational Model Trained via Knowledge Distillation},
year = {2026},
note = {Fine-tuned from drlee1/HanForge-base on 24.7k Korean Q\&A pairs across five everyday domains}
}
- Downloads last month
- 21
Model tree for drlee1/HanForge-47M-SFT
Base model
drlee1/HanForge-base
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="drlee1/HanForge-47M-SFT", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)