Instructions to use pathcosmos/frankenstallm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use pathcosmos/frankenstallm with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="pathcosmos/frankenstallm")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("pathcosmos/frankenstallm")
model = AutoModelForCausalLM.from_pretrained("pathcosmos/frankenstallm")

llama-cpp-python

How to use pathcosmos/frankenstallm with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="pathcosmos/frankenstallm",
	filename="gguf/frankenstallm-3b-Q4_K_M.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use pathcosmos/frankenstallm with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf pathcosmos/frankenstallm:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf pathcosmos/frankenstallm:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf pathcosmos/frankenstallm:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf pathcosmos/frankenstallm:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf pathcosmos/frankenstallm:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf pathcosmos/frankenstallm:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf pathcosmos/frankenstallm:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf pathcosmos/frankenstallm:Q4_K_M

Use Docker

docker model run hf.co/pathcosmos/frankenstallm:Q4_K_M

LM Studio
Jan

vLLM

How to use pathcosmos/frankenstallm with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "pathcosmos/frankenstallm"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "pathcosmos/frankenstallm",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/pathcosmos/frankenstallm:Q4_K_M

SGLang

How to use pathcosmos/frankenstallm with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "pathcosmos/frankenstallm" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "pathcosmos/frankenstallm",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "pathcosmos/frankenstallm" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "pathcosmos/frankenstallm",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Ollama
How to use pathcosmos/frankenstallm with Ollama:
```
ollama run hf.co/pathcosmos/frankenstallm:Q4_K_M
```

Unsloth Studio new

How to use pathcosmos/frankenstallm with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for pathcosmos/frankenstallm to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for pathcosmos/frankenstallm to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for pathcosmos/frankenstallm to start chatting

Docker Model Runner
How to use pathcosmos/frankenstallm with Docker Model Runner:
```
docker model run hf.co/pathcosmos/frankenstallm:Q4_K_M
```

Lemonade

How to use pathcosmos/frankenstallm with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull pathcosmos/frankenstallm:Q4_K_M

Run and chat with the model

lemonade run user.frankenstallm-Q4_K_M

List all available models

lemonade list

somebody-to-love commited on Mar 10

Commit

9eb04a9

verified ·

1 Parent(s): dc788f2

docs: add comprehensive HuggingFace model card with benchmarks and usage guide

Browse files

Files changed (1) hide show

README.md +352 -152

README.md CHANGED Viewed

@@ -1,172 +1,372 @@
 ---
 language:
-- ko
-license: other
 tags:
-- llm
-- korean
-- orpo
-- gguf
 ---
-# FRANKENSTALLM 3B v2 (Byte-Fallback Fixed)
-한국어 중심 **FRANKENSTALLM 3B** ORPO 파인튜닝 체크포인트에 **byte-fallback 토큰 256개**를 추가한 버전입니다.
-llama.cpp/GGUF 추론 시 줄바꿈(`\n`) 등 미등록 문자로 인한 크래시를 방지하기 위해 사용합니다.
-## 모델 상세
-| 항목 | 값 |
-|------|-----|
-| **Architecture** | LlamaForCausalLM |
-| **Params** | ~3B |
-| **Hidden size** | 2048 |
-| **Layers** | 24 |
-| **Attention heads** | 16 |
-| **KV heads** | 4 |
-| **Max position** | 4096 |
-| **Vocab size** | **64,256** (64,000 + 256 byte-fallback) |
-| **Training** | ORPO (SFT → ORPO) |
-## 변경 사항 (v2)
-- 토크나이저: `byte_fallback=True`, `<0x00>`~`<0xFF>` 256개 토큰 추가
-- 임베딩: 64,000 → 64,256 리사이즈, 새 토큰 초기화
-- GGUF 변환·Ollama 배포 시 뉴라인 포함 입력 정상 처리 확인
-## 학습 환경 (Training Hardware)
-### GPU
-| 항목 | 사양 |
-|------|------|
-| **GPU** | 8× NVIDIA B200 |
-| **VRAM (per GPU)** | 183 GB HBM3e |
-| **Total VRAM** | ~1,466 GB (~1.47 TB) |
-| **FP8 Tensor Core** | 2,250 TFLOPS/GPU (총 18,000 TFLOPS) |
-| **BF16 Tensor Core** | 1,125 TFLOPS/GPU |
-| **HBM3e Bandwidth** | ~7.67 TB/s per GPU |
-| **Interconnect** | NVLink 5.0 (NV18, 900 GB/s bidirectional) |
-| **Topology** | NVSwitch — 모든 GPU↔GPU 단일 홉 all-to-all mesh |
-| **SMs per GPU** | 148 |
-| **L2 Cache per GPU** | 126.5 MB |
-### CPU & Memory
-| 항목 | 사양 |
-|------|------|
-| **CPU** | 2× AMD EPYC 9365 (Turin / Zen 5) |
-| **Physical Cores** | 72 (36코어 × 2소켓) |
-| **L3 Cache** | 384 MB (12 CCX × 32 MB) |
-| **System RAM** | 2.21 TB DDR5 (NUMA 2노드 × ~1.1 TB) |
-| **GPU↔NUMA 매핑** | GPU 0–3 → NUMA node 0 / GPU 4–7 → NUMA node 1 |
-### 소프트웨어 스택
-| 패키지 | 버전 |
-|--------|------|
-| **CUDA** | 13.1 |
-| **Driver** | 580.95.05 |
-| **PyTorch** | 2.10.0a0+b4e4ee81d3 (NVIDIA nv25.12 커스텀 빌드, B200 최적화) |
-| **Transformer Engine** | 2.10.0 |
-| **FlashAttention** | 2.7.4.post1+25.12 |
-| **NCCL** | 2.28.9 |
-| **Triton** | 3.5.1 |
-| **TRL** | ORPO fine-tuning |
-> **참고**: PyTorch는 NVIDIA B200 최적화 커스텀 빌드(`nv25.12`)를 사용합니다. FP8 네이티브 연산(`torch.float8_e4m3fn`) 지원 환경입니다.
-## ORPO 평가 요약 (동일 체크포인트 기준)
-- **평가 일시**: 2026-03-09
-- **Preference Accuracy**: 76.02%
-- **Reward Margin**: 0.6100
-- **Eval Loss**: 1.7910 → 1.6250
-- **KoBEST (0-shot) 평균**: 52.75%
-- **생성 품질**: Greedy 3-gram 반복률 30.89%, EOS 종료율 66.67%
-- **PPL Forgetting**: 최대 4.1% (기준 <15%)
-- **종합**: 7/10 차원 통과, 정량 스코어 63.7/100
-상세: 프로젝트 내 `reports/2026-03-09_ORPO_EVALUATION_REPORT.md` 참고.
-## Ollama 배포 벤치마크 (Q4_K_M, 2026-03-09)
-- **모델명**: `frankenstallm-3b-v2`
-- **테스트 수**: 35 (자동 20 + 수동 15)
-- **자동 채점 평균**: 46.7
-- **카테고리**: korean_nlu 100.0, reasoning 50.0, knowledge 75.0, instruction_following 66.7, code 0.0, safety 10.0, repetition_resistance 2.2 등
-- **지���**: Avg TTFT 16.7 ms, Avg TPS 142.5
-상세: `reports/2026-03-09_GGUF_DEPLOYMENT_AND_EVAL_REPORT.md`, `eval/results/frankenstallm-3b-v2/ollama_benchmark_summary.md`
-## 샘플링 파라미터 (Sampling Config)
-ORPO 평가 그리드 실측 최적값 (`t0.7_rep1.2`) 기준입니다.
-### 권장 파라미터
-| 파라미터 | 값 | 비고 |
-|---------|-----|------|
-| `temperature` | **0.7** | 창의성/일관성 균형점 |
-| `repetition_penalty` (PyTorch) | **1.2** | 반복 억제 |
-| `repeat_penalty` (Ollama) | **1.2** | 동일 값 |
-| `top_p` | **0.9** | nucleus sampling |
-| `top_k` | **50** | |
-| `max_new_tokens` | **512** | |
-| `num_ctx` | **4096** | context window |
-### 평가 결과 (ORPO eval grid)
-| 설정 | 3-gram 반복률 | 4-gram 반복률 | EOS 종료율 | 평균 토큰 수 |
-|------|-------------|-------------|----------|------------|
-| **t0.7 / rep1.2 (권장)** | **0.0%** | **0.0%** | **100%** | 189.2 |
-| t0.8 / rep1.05 (기본) | 4.7% | 2.3% | 100% | 221.4 |
-| greedy (temp=0) | 30.89% | — | 66.67% | — |
-> Ollama Q4_K_M 실측: 3-gram 반복 1.8% (자연 어절 반복), EOS 100% 종료 확인.
-> greedy 대비 반복률 **30.89% → 0%** 해소.
-### Transformers 사용 예시
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 import torch
-model = AutoModelForCausalLM.from_pretrained("pathcosmos/frankenstallm")
-tokenizer = AutoTokenizer.from_pretrained("pathcosmos/frankenstallm")
-inputs = tokenizer("안녕하세요, 오늘 날씨가", return_tensors="pt")
-outputs = model.generate(
-    **inputs,
-    temperature=0.7,
-    repetition_penalty=1.2,
-    top_p=0.9,
-    top_k=50,
-    max_new_tokens=512,
-    do_sample=True,
 )
 print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 ```
-## 사용
-- **Transformers**: 이 체크포인트를 그대로 `from_pretrained(...)` 로 로드 가능.
-- **GGUF / Ollama**:
-  ```bash
-  # Q4_K_M (권장, 757MB)
-  ollama create frankenstallm-3b-v2:Q4_K_M -f gguf/Modelfile.3b-v2-Q4_K_M
-  ollama run frankenstallm-3b-v2:Q4_K_M
-  # Q8_0 (고품질, 1.2GB)
-  ollama create frankenstallm-3b-v2:Q8_0 -f gguf/Modelfile.3b-v2-Q8_0
-  ollama run frankenstallm-3b-v2:Q8_0
-  # f16 (최고품질, 2.3GB)
-  ollama create frankenstallm-3b-v2:f16 -f gguf/Modelfile.3b-v2-f16
-  ollama run frankenstallm-3b-v2:f16
-  ```
-  Modelfile에 검증된 샘플링 파라미터(`temperature=0.7, repeat_penalty=1.2`)가 포함되어 있습니다.
-## 라이선스
-프로젝트(FRANKENSTALLM) 라이선스에 따릅니다.

 ---
+library_name: transformers
+license: apache-2.0
 language:
+  - ko
+  - en
+model_type: llama
 tags:
+  - 3b
+  - korean
+  - from-scratch
+  - orpo
+  - instruction-tuned
+  - preference-aligned
+  - fp8
+  - b200
+  - gguf
+datasets:
+  - cc100
+  - allenai/c4
+  - heegyu/orca-math-korean-preference-cleaned
+  - nayohan/preference-collection-ko-full
+  - maywell/ko_Ultrafeedback_binarized
+  - HuggingFaceTB/cosmopedia
+  - wikimedia/wikipedia
+pipeline_tag: text-generation
+model-index:
+  - name: FRANKENSTALLM-3B
+    results:
+      - task:
+          type: text-generation
+        dataset:
+          type: kobest
+          name: KoBEST (0-shot)
+        metrics:
+          - name: Average
+            type: accuracy
+            value: 52.75
+          - name: COPA
+            type: accuracy
+            value: 63.9
+          - name: HellaSwag-KO
+            type: accuracy
+            value: 38.0
+          - name: SentiNeg
+            type: accuracy
+            value: 62.5
+          - name: BoolQ
+            type: accuracy
+            value: 50.6
+          - name: WiC
+            type: accuracy
+            value: 48.8
+      - task:
+          type: text-generation
+        dataset:
+          type: haerae
+          name: HAE-RAE (0-shot)
+        metrics:
+          - name: Average
+            type: accuracy
+            value: 21.81
+      - task:
+          type: text-generation
+        dataset:
+          type: piqa
+          name: PIQA (0-shot)
+        metrics:
+          - name: Accuracy
+            type: accuracy
+            value: 59.9
+      - task:
+          type: text-generation
+        dataset:
+          type: ai2_arc
+          name: ARC-Easy (0-shot)
+        metrics:
+          - name: Accuracy
+            type: accuracy
+            value: 36.0
 ---
+# FRANKENSTALLM 3B
+> **A Korean 3B LLM built entirely from scratch — tokenizer, pretraining, SFT, and ORPO — on 8× NVIDIA B200 GPUs.**
+| | |
+|---|---|
+| **Developer** | [pathcosmos](https://huggingface.co/pathcosmos) |
+| **Parameters** | ~2.4B (3B-class with weight tying) |
+| **Languages** | Korean (primary), English (secondary) |
+| **License** | Apache 2.0 |
+| **Training** | 3-phase: Pretrain → SFT → ORPO |
+| **Hardware** | 8× NVIDIA B200 (FP8), ~86 hours total |
+---
+## Quick Start
+### Transformers
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 import torch
+model_id = "pathcosmos/frankenstallm"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(
+    model_id, torch_dtype=torch.bfloat16, device_map="auto"
 )
+inputs = tokenizer(
+    "한국의 전통 음식 중 김치에 대해 설명해주세요.",
+    return_tensors="pt"
+).to(model.device)
+with torch.no_grad():
+    outputs = model.generate(
+        **inputs,
+        do_sample=True,
+        temperature=0.7,
+        repetition_penalty=1.2,  # recommended
+        top_p=0.9,
+        max_new_tokens=512,
+        pad_token_id=tokenizer.eos_token_id,
+    )
 print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 ```
+### Ollama (GGUF)
+```bash
+# Download GGUF + Modelfile
+huggingface-cli download pathcosmos/frankenstallm \
+  gguf/frankenstallm-3b-v2-Q4_K_M.gguf \
+  gguf/Modelfile.3b-v2-Q4_K_M \
+  --local-dir ./frankenstallm
+# Fix FROM path in Modelfile, then create
+ollama create frankenstallm -f ./frankenstallm/gguf/Modelfile.3b-v2-Q4_K_M
+# Run
+ollama run frankenstallm
+```
+---
+## Model Highlights
+- **From-scratch Korean tokenizer**: SentencePiece Unigram, 64K vocab, 99.95% Korean character coverage
+- **3-phase training pipeline**: Pretrain (57K steps, ~60B tokens) → SFT (25.5K steps, 2.4M samples) → ORPO (10K steps, 630K preference pairs)
+- **B200 FP8 native training**: TransformerEngine MXFP8 on NVIDIA B200 — 2× theoretical throughput vs BF16
+- **GGUF deployment ready**: Q4_K_M (757MB), Q8_0 (1.2GB), F16 (2.3GB) with optimized Ollama Modelfiles
+---
+## Architecture
+| Component | Value |
+|-----------|-------|
+| Type | Decoder-only Transformer (LLaMA-style) |
+| Hidden size | 3,072 |
+| Layers | 28 |
+| Attention heads | 24 |
+| KV heads | 8 (GQA 3:1) |
+| FFN dim | 8,192 (SwiGLU) |
+| Vocab size | 64,000 |
+| Context length | 4,096 (trained at 2,048) |
+| Position encoding | RoPE (θ=500,000) |
+| Normalization | Pre-norm RMSNorm |
+| Attention impl | FlashAttention-2 |
+| Precision | FP8 (MXFP8 via TransformerEngine) |
+| Weight tying | Yes (embedding ↔ lm_head) |
+---
+## Training Pipeline
+### Phase 1: Pretraining
+| Detail | Value |
+|--------|-------|
+| Steps | 57,000 |
+| Final loss | 1.466 |
+| Tokens seen | ~60B (38.5B unique × ~1.5 epochs) |
+| Duration | ~63 hours |
+| Data | CC-100 KO, HPLT KO, C4 KO, NamuWiki, Wikipedia KO, Cosmopedia (EN) |
+| Batch size | 5 × 8 GPU × 8 accum × 2,048 seq = ~655K tok/step |
+### Phase 2: Supervised Fine-Tuning (SFT)
+| Detail | Value |
+|--------|-------|
+| Steps | 25,500 (early stop at 77.3%) |
+| Best val_loss | 1.8851 (step 23,000) |
+| Duration | ~15.5 hours |
+| Data | 2,439,397 samples from 24 sources (7.48 GB) |
+| Mix | 70% SFT + 30% pretrain replay (catastrophic forgetting prevention) |
+| Knowledge forgetting | 0.9% (19 datasets) |
+### Phase 3: ORPO (Odds Ratio Preference Optimization)
+| Detail | Value |
+|--------|-------|
+| Steps | 9,997 (early convergence) |
+| Best eval_loss | 1.625 |
+| Preference accuracy | 76.02% |
+| Reward margin | 0.6100 |
+| Duration | ~7 hours |
+| Data | ~630K preference pairs from 7 Korean HF datasets |
+| Hyperparams | beta=0.25, lr=1.2e-5, eff_batch=128 |
+**Total training time: ~86 hours on 8× B200**
+---
+## Benchmarks
+### Training Phase Progression (Base → SFT → ORPO)
+| Benchmark | Base | SFT | ORPO | Δ (Base→ORPO) |
+|-----------|:----:|:---:|:----:|:---:|
+| **KoBEST Avg (0-shot)** | 43.7% | 43.3% | **52.8%** | **+9.1pp** |
+| KoBEST COPA | 49.3% | 48.6% | **63.9%** | +14.6pp |
+| KoBEST HellaSwag-KO | 21.6% | 19.8% | **38.0%** | +16.4pp |
+| KoBEST SentiNeg | 48.6% | 49.1% | **62.5%** | +13.9pp |
+| KoBEST BoolQ | 50.3% | 50.1% | 50.6% | +0.3pp |
+| PIQA | 52.5% | 52.6% | **59.9%** | +7.3pp |
+| ARC-Easy | 25.6% | 25.9% | **36.0%** | +10.4pp |
+| HAE-RAE | 19.7% | 19.9% | 21.8% | +2.1pp |
+| HellaSwag EN | 26.2% | 26.1% | 29.2% | +3.0pp |
+| Greedy 3-gram repetition | 61.0% | 73.0% | **30.9%** | -30.1pp |
+| EOS termination rate | 0% | 60% | **67%** | +67pp |
+| PPL forgetting | — | 0.9% | 4.1% | within 15% ✅ |
+### 3B-class Model Comparison (Ollama, 35 tests)
+| Model | Params | Korean NLU | Knowledge | Instruction | Reasoning | Avg Score |
+|-------|:------:|:----------:|:---------:|:-----------:|:---------:|:---------:|
+| Qwen 2.5 3B | 3B | 100.0 | 20.8 | 55.6 | 62.5 | **63.4** |
+| Phi-4 Mini | 3.8B | 66.7 | 29.2 | 33.3 | **87.5** | 60.6 |
+| **FRANKENSTALLM 3B** | **3B** | **100.0** | **75.0** | **66.7** | 50.0 | 46.7 |
+> FRANKENSTALLM leads in **Korean NLU** (tied with Qwen), **Korean Knowledge** (75 vs 20.8/29.2), and **Instruction Following** (66.7 vs 55.6/33.3).
+### Inference Speed (Ollama, Q4_K_M)
+| Model | Avg TTFT | TPS | Note |
+|-------|:--------:|:---:|------|
+| **FRANKENSTALLM 3B** | **16.7ms** | **142.5** | Fastest |
+| Phi-4 Mini 3.8B | 25.6ms | 100.4 | |
+| Qwen 2.5 3B | 28.2ms | 93.8 | |
+### Perplexity Preservation (ORPO Knowledge Retention)
+| Dataset | Base PPL | ORPO PPL | Forgetting |
+|---------|:--------:|:--------:|:----------:|
+| Korean C4 | 5.72 | 5.87 | +2.7% |
+| Korean Wiki | 11.84 | 12.21 | +3.2% |
+| Max forgetting | — | — | 4.1% ✅ |
+---
+## Training Data
+### Pretraining (~38.5B tokens)
+| Category | Sources | Est. Tokens |
+|----------|---------|:-----------:|
+| Korean Web Crawl | C4 KO, CC-100 KO, HPLT KO | ~17.2B |
+| Korean Encyclopedia | Wikipedia KO, NamuWiki (2 versions) | ~2.8B |
+| English Educational | Cosmopedia (Stories, Web, Stanford, WikiHow, OpenStax, Khan) | ~5.7B |
+| English Math/Science | AutoMathText, OpenWebMath, Proof-Pile-2 | ~8.5B |
+| Code | StarCoder (filtered) | ~4.3B |
+### SFT (2.4M samples, 24 sources)
+| Domain | Share | Key Datasets |
+|--------|:-----:|-------------|
+| Reasoning/CoT | 38% | reasoning_r1_1.4m, magpie_reasoning |
+| Korean Instructions | 23% | korean_instruction_mix, open_korean_instructions, kullm_v2 |
+| English General | 16% | openhermes_2.5, ultrachat_200k |
+| Math | 12% | NuminaMath-CoT, orca-math-ko |
+| Dialog/Code/Other | 11% | smol-koreantalk, Evol-Instruct-Code-80k-ko |
+### ORPO (~630K preference pairs, 7 sources)
+| Dataset | Size | Domain |
+|---------|:----:|--------|
+| nayohan/preference-collection-ko-full | 4.9GB | General preference |
+| heegyu/orca-math-korean-preference-cleaned | 1.6GB | Math reasoning |
+| kuotient/orca-math-korean-dpo-pairs | 750MB | Math DPO |
+| maywell/ko_Ultrafeedback_binarized | 394MB | Feedback alignment |
+| tellang/yeji-preference-ko-v1 | 171MB | General preference |
+| jojo0217/korean_rlhf_dataset | 137MB | RLHF pairs |
+| lemon-mint/korean-realqa-reasoning-v01-preference | 58MB | QA reasoning |
+---
+## GGUF & Ollama
+### Available Quantizations
+| File | Size | Description |
+|------|:----:|-------------|
+| `gguf/frankenstallm-3b-v2-Q4_K_M.gguf` | 757MB | **Recommended** — best size/quality balance |
+| `gguf/frankenstallm-3b-v2-Q8_0.gguf` | 1.2GB | Higher quality |
+| `gguf/frankenstallm-3b-v2-f16.gguf` | 2.3GB | Full precision |
+| `model.safetensors` | 4.76GB | Transformers native (ORPO best, byte-fallback fixed) |
+### Recommended Sampling Parameters
+| Parameter | Value | Notes |
+|-----------|:-----:|-------|
+| `temperature` | 0.7 | Optimal for Korean generation quality |
+| `repeat_penalty` | 1.2 | **Required** — without it, greedy repetition is 30.9% |
+| `top_p` | 0.9 | Nucleus sampling |
+| `top_k` | 50 | Top-k candidates |
+| `max_tokens` | 512 | Max generation length |
+| `num_ctx` | 4096 | Context window (do not exceed) |
+> ⚠️ Always use `repeat_penalty >= 1.2`. With it, repetition drops to **0%**. Without it, greedy decoding produces ~31% 3-gram repetition.
+---
+## Limitations
+- **English performance is limited**: MMLU-EN ~23%, HellaSwag-EN ~29% — this is a Korean-focused model
+- **Code generation**: Near zero capability (limited code in training data)
+- **Greedy repetition**: 30.9% 3-gram repetition without `repeat_penalty` — always use sampling with `repeat_penalty >= 1.2`
+- **Safety**: Safety alignment data was not included in training; use with appropriate guardrails
+- **Scale gap**: Compared to commercial 3B models trained on trillions of tokens, this model was trained on ~60B tokens — expect lower overall benchmark scores
+---
+## Hardware & Training Environment
+| Component | Specification |
+|-----------|---------------|
+| GPU | 8× NVIDIA B200 (183GB HBM3e each, ~1.47TB total) |
+| FP8 Compute | 2,250 TFLOPS/GPU (18,000 TFLOPS total) |
+| Interconnect | NVLink 5.0, NVSwitch all-to-all mesh |
+| CPU | 2× AMD EPYC 9365 (72 cores, Zen 5) |
+| RAM | 2.21 TB DDR5 |
+| PyTorch | 2.10.0a0+b4e4ee81d3.nv25.12 (NVIDIA custom) |
+| TransformerEngine | 2.10.0 |
+| FlashAttention | 2.7.4 |
+| NCCL | 2.28.9 |
+| CUDA | 13.1 |
+| Total training | ~86 hours (Pretrain 63h + SFT 15.5h + ORPO 7h) |
+---
+## Citation
+```bibtex
+@misc{frankenstallm2026,
+  title={FRANKENSTALLM: A Korean 3B LLM Built From Scratch on B200 GPUs},
+  author={pathcosmos},
+  year={2026},
+  url={https://huggingface.co/pathcosmos/frankenstallm},
+  note={3-phase training (Pretrain, SFT, ORPO) with FP8 on 8x NVIDIA B200}
+}
+```
+---
+## Links
+- **GitHub**: [pathcosmos/FRANKENSTALLM](https://github.com/pathcosmos/FRANKENSTALLM) — Full source code, training scripts, and builder's log
+- **HuggingFace**: [pathcosmos/frankenstallm](https://huggingface.co/pathcosmos/frankenstallm)