v2 GGUF 및 safetensors 파일이 변환 과정의 오류로 **1.2B 모델(hidden_size=2048, 24 layers)**로 잘못 배포되었습니다.
2026-03-26에 올바른 **3B ORPO 체크포인트(hidden_size=3072, 28 layers, vocab_size=64256, byte-fallback 적용)**로 교체 완료했습니다.
이전에 다운로드한 v2 파일이 있다면 재다운로드를 권장합니다.
한국어 3B LLM을 처음부터 직접 만들었습니다 — 토크나이저 학습부터 사전학습, SFT, ORPO까지, 8× NVIDIA B200 GPU 위에서.
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "pathcosmos/frankenstallm"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id, torch_dtype=torch.bfloat16, device_map="auto"
)
inputs = tokenizer(
"한국의 전통 음식 중 김치에 대해 설명해주세요.",
return_tensors="pt"
).to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
do_sample=True,
temperature=0.7,
repetition_penalty=1.2, # 권장
top_p=0.9,
max_new_tokens=512,
pad_token_id=tokenizer.eos_token_id,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Ollama (GGUF)
# GGUF + Modelfile 다운로드
huggingface-cli download pathcosmos/frankenstallm \
gguf/frankenstallm-3b-v2-Q4_K_M.gguf \
gguf/Modelfile.3b-v2-Q4_K_M \
--local-dir ./frankenstallm
# Modelfile 내 FROM 경로 수정 후 생성
ollama create frankenstallm -f ./frankenstallm/gguf/Modelfile.3b-v2-Q4_K_M
# 실행
ollama run frankenstallm
⚠️ 반드시 repeat_penalty >= 1.2를 사용하세요. 적용하면 반복률이 0% 로 떨어집니다. 미적용 시 greedy 디코딩에서 ~31% 3-gram 반복이 발생합니다.
제한 사항
영어 성능 제한: MMLU-EN ~23%, HellaSwag-EN ~29% — 한국어 특화 모델입니다
코드 생성: 거의 불가능 (학습 데이터에 코드 비중이 낮음)
Greedy 반복: repeat_penalty 미사용 시 30.9% 3-gram 반복 — 반드시 repeat_penalty >= 1.2 사용
안전성: 안전 정렬(safety alignment) 데이터가 학습에 포함되지 않았으므로 적절한 가드레일과 함께 사용하세요
규모 차이: 수조 토큰으로 학습된 상용 3B 모델 대비 ~600억 토큰으로 학습 — 전반적 벤치마크 점수는 낮을 수 있습니다
하드웨어 및 학습 환경
구성 요소
사양
GPU
8× NVIDIA B200 (183GB HBM3e × 8, 총 ~1.47TB)
FP8 연산
2,250 TFLOPS/GPU (총 18,000 TFLOPS)
인터커넥트
NVLink 5.0, NVSwitch all-to-all mesh
CPU
2× AMD EPYC 9365 (72코어, Zen 5)
RAM
2.21 TB DDR5
PyTorch
2.10.0a0+b4e4ee81d3.nv25.12 (NVIDIA 커스텀)
TransformerEngine
2.10.0
FlashAttention
2.7.4
NCCL
2.28.9
CUDA
13.1
총 학습 시간
~86시간 (사전학습 63h + SFT 15.5h + ORPO 7h)
인용
@misc{frankenstallm2026,
title={FRANKENSTALLM: A Korean 3B LLM Built From Scratch on B200 GPUs},
author={pathcosmos},
year={2026},
url={https://huggingface.co/pathcosmos/frankenstallm},
note={3-phase training (Pretrain, SFT, ORPO) with FP8 on 8x NVIDIA B200}
}
대한민국 정부의 AI 인프라 지원 사업 덕분에 8× NVIDIA B200 GPU 환경에서 한국어 3B LLM을 처음부터 학습할 수 있었습니다. 국가 차원의 AI 컴퓨팅 자원 지원에 깊이 감사드립니다.
🇺🇸 English version below
FRANKENSTALLM 3B
⚠️ v2 Model Replacement Notice (2026-03-26)
The v2 GGUF and safetensors files were incorrectly deployed as a 1.2B model (hidden_size=2048, 24 layers) due to a conversion pipeline error.
On 2026-03-26, they were replaced with the correct 3B ORPO checkpoint (hidden_size=3072, 28 layers, vocab_size=64256, byte-fallback applied).
If you downloaded v2 files before this date, please re-download.
A Korean 3B LLM built entirely from scratch — tokenizer, pretraining, SFT, and ORPO — on 8× NVIDIA B200 GPUs.
⚠️ Always use repeat_penalty >= 1.2. With it, repetition drops to 0%. Without it, greedy decoding produces ~31% 3-gram repetition.
Limitations
English performance is limited: MMLU-EN ~23%, HellaSwag-EN ~29% — this is a Korean-focused model
Code generation: Near zero capability (limited code in training data)
Greedy repetition: 30.9% 3-gram repetition without repeat_penalty — always use sampling with repeat_penalty >= 1.2
Safety: Safety alignment data was not included in training; use with appropriate guardrails
Scale gap: Compared to commercial 3B models trained on trillions of tokens, this model was trained on ~60B tokens — expect lower overall benchmark scores
Hardware & Training Environment
Component
Specification
GPU
8× NVIDIA B200 (183GB HBM3e each, ~1.47TB total)
FP8 Compute
2,250 TFLOPS/GPU (18,000 TFLOPS total)
Interconnect
NVLink 5.0, NVSwitch all-to-all mesh
CPU
2× AMD EPYC 9365 (72 cores, Zen 5)
RAM
2.21 TB DDR5
PyTorch
2.10.0a0+b4e4ee81d3.nv25.12 (NVIDIA custom)
TransformerEngine
2.10.0
FlashAttention
2.7.4
NCCL
2.28.9
CUDA
13.1
Total training
~86 hours (Pretrain 63h + SFT 15.5h + ORPO 7h)
Citation
@misc{frankenstallm2026,
title={FRANKENSTALLM: A Korean 3B LLM Built From Scratch on B200 GPUs},
author={pathcosmos},
year={2026},
url={https://huggingface.co/pathcosmos/frankenstallm},
note={3-phase training (Pretrain, SFT, ORPO) with FP8 on 8x NVIDIA B200}
}
EVAFRILL-Mo | 🤗 HuggingFace — Hybrid Mamba-2 + Transformer sister project (2.94B params). While FRANKENSTALLM uses a pure Transformer architecture, EVAFRILL-Mo adopts Mamba-2 SSM + sparse Transformer attention. Both share the same tokenizer and training infrastructure.
Acknowledgment
This project was conducted using GPU computing resources provided through the "Advanced GPU Utilization Support Program" (MSIT Notice No. 2025-1068) by the Ministry of Science and ICT (MSIT) of the Republic of Korea.
Organized by: Ministry of Science and ICT (MSIT), National IT Industry Promotion Agency (NIPA)
Operated by: Korea Association of Information & Telecommunication (KAIT)
We are deeply grateful for the national-level AI computing infrastructure support from the Korean government, which made it possible to train a Korean 3B LLM from scratch on 8× NVIDIA B200 GPUs.