Instructions to use pathcosmos/frankenstallm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use pathcosmos/frankenstallm with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="pathcosmos/frankenstallm")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("pathcosmos/frankenstallm")
model = AutoModelForCausalLM.from_pretrained("pathcosmos/frankenstallm")

llama-cpp-python

How to use pathcosmos/frankenstallm with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="pathcosmos/frankenstallm",
	filename="gguf/frankenstallm-3b-Q4_K_M.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use pathcosmos/frankenstallm with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf pathcosmos/frankenstallm:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf pathcosmos/frankenstallm:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf pathcosmos/frankenstallm:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf pathcosmos/frankenstallm:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf pathcosmos/frankenstallm:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf pathcosmos/frankenstallm:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf pathcosmos/frankenstallm:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf pathcosmos/frankenstallm:Q4_K_M

Use Docker

docker model run hf.co/pathcosmos/frankenstallm:Q4_K_M

LM Studio
Jan

vLLM

How to use pathcosmos/frankenstallm with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "pathcosmos/frankenstallm"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "pathcosmos/frankenstallm",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/pathcosmos/frankenstallm:Q4_K_M

SGLang

How to use pathcosmos/frankenstallm with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "pathcosmos/frankenstallm" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "pathcosmos/frankenstallm",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "pathcosmos/frankenstallm" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "pathcosmos/frankenstallm",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Ollama
How to use pathcosmos/frankenstallm with Ollama:
```
ollama run hf.co/pathcosmos/frankenstallm:Q4_K_M
```

Unsloth Studio new

How to use pathcosmos/frankenstallm with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for pathcosmos/frankenstallm to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for pathcosmos/frankenstallm to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for pathcosmos/frankenstallm to start chatting

Docker Model Runner
How to use pathcosmos/frankenstallm with Docker Model Runner:
```
docker model run hf.co/pathcosmos/frankenstallm:Q4_K_M
```

Lemonade

How to use pathcosmos/frankenstallm with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull pathcosmos/frankenstallm:Q4_K_M

Run and chat with the model

lemonade run user.frankenstallm-Q4_K_M

List all available models

lemonade list

Upload eval_reports/2026-03-09_GGUF_DEPLOYMENT_AND_EVAL_REPORT.md with huggingface_hub

by somebody-to-love - opened Mar 9

base: refs/heads/main

←

from: refs/pr/3

Discussion Files changed

+127

-0

Files changed (1) hide show

eval_reports/2026-03-09_GGUF_DEPLOYMENT_AND_EVAL_REPORT.md +127 -0

eval_reports/2026-03-09_GGUF_DEPLOYMENT_AND_EVAL_REPORT.md ADDED Viewed

	@@ -0,0 +1,127 @@

+# FRANKENSTALLM 3B v2 — GGUF 변환·배포 및 Ollama 평가 보고서
+- **작성일**: 2026-03-09
+- **대상**: byte-fallback 수정 적용 체크포인트 → GGUF 변환 → Ollama 배포 → 벤치마크
+---
+## 1. 요약
+| 항목 | 내용 |
+|------|------|
+| **원인** | SentencePiece Unigram 토크나이저에 `byte_fallback` 미적용 → `\n` 등 미등록 문자 시 llama.cpp 크래시 |
+| **조치** | 256개 byte-fallback 토큰 추가, 임베딩 64000→64256 리사이즈, GGUF 재변환, Q4_K_M 양자화 |
+| **배포** | Ollama 모델 `frankenstallm-3b-v2:latest` (792 MB, Q4_K_M) |
+| **뉴라인 검증** | ✅ 크래시 없이 `\n` 포함 프롬프트 처리 확인 |
+| **Ollama 벤치마크** | 35개 테스트, 자동 채점 평균 46.7, 평균 TPS 142.5, TTFT 16.7 ms |
+---
+## 2. 파이프라인 단계
+### 2.1 토크나이저·임베딩 수정
+- **스크립트**: `scripts/fix_tokenizer_byte_fallback.py`
+- **입력**: `outputs/hf_checkpoint-best`
+- **출력**: `outputs/hf_checkpoint-best-fixed`
+- **변경 사항**:
+  - `tokenizer.json`: `byte_fallback=True`, `<0x00>`~`<0xFF>` 256개 토큰 추가
+  - `config.json`: `vocab_size` 64000 → 64256
+  - 임베딩 레이어 리사이즈 및 새 토큰 초기화 후 safetensors 저장
+### 2.2 GGUF 변환 및 양자화
+- **F16 GGUF**: `outputs/llama.cpp/convert_hf_to_gguf.py`
+  `outputs/hf_checkpoint-best-fixed` → `outputs/gguf/frankenstallm-3b-v2-f16.gguf`
+- **Q4_K_M 양자화**: `outputs/llama.cpp/build/bin/llama-quantize`
+  → `outputs/gguf/frankenstallm-3b-v2-Q4_K_M.gguf` (약 792 MB)
+### 2.3 Ollama 배포
+- **Modelfile**: 로컬 GGUF 경로 `FROM` 지정 후 `ollama create`
+- **모델 이름**: `frankenstallm-3b-v2:latest`
+### 2.4 뉴라인 테스트
+- **방법**: Ollama API로 `"첫 줄\n두 번째 줄\n세 번째 줄이라고 말해줘."` 프롬프트 전송
+- **결과**: HTTP 200, `done: true`, 크래시 없음 → byte-fallback 수정 검증 완료
+---
+## 3. Ollama 벤치마크 결과 (frankenstallm-3b-v2)
+- **실행**: `python eval/ollama_benchmark.py --models frankenstallm-3b-v2 --output-dir eval/results/frankenstallm-3b-v2`
+- **일시**: 2026-03-09 23:24:22
+- **총 테스트**: 35 (자동 채점 20 + 수동 검토 15)
+### 3.1 전체 자동 채점 평균
+| 모델 | Auto Avg |
+|------|----------|
+| frankenstallm-3b-v2 | **46.7** |
+### 3.2 카테고리별 점수 (자동/수동)
+| 카테고리 | 점수 | 비고 |
+|----------|------|------|
+| korean_nlu | 100.0 | 3 자동 / 2 수동 |
+| korean_generation | manual | 5 수동 |
+| reasoning | 50.0 | 4 자동 / 1 수동 |
+| knowledge | 75.0 | 4 자동 / 1 수동 |
+| code | 0.0 | 3 자동 |
+| safety | 10.0 | 2 자동 / 1 수동 |
+| instruction_following | 66.7 | 3 자동 |
+| multilingual | manual | 3 수동 |
+| repetition_resistance | 2.2 | 3 자동 (반복률 높음) |
+### 3.3 지연 시간
+| 지표 | 값 |
+|------|-----|
+| Avg TTFT (ms) | 16.7 |
+| P50 TTFT (ms) | 15.8 |
+| P95 TTFT (ms) | 26.2 |
+| Avg TPS | 142.5 |
+| P50 TPS | 142.7 |
+| P95 TPS | 143.3 |
+### 3.4 반복률 상세 (repetition_resistance)
+| Test ID | Rep Rate | Unique/Total N-grams | Score |
+|---------|----------|----------------------|-------|
+| rep_01 | 73.76% | 122/465 | 0.0 |
+| rep_02 | 59.72% | 255/633 | 0.0 |
+| rep_03 | 46.70% | 226/424 | 6.6 |
+- **원본 ORPO 평가** (HF 체크포인트, Greedy): 3-gram 반복률 30.89%, EOS 67%.
+  Ollama Q4_K_M + 벤치마크 프롬프트에서는 반복이 더 두드러짐.
+### 3.5 결과 파일 위치
+- **JSON**: `eval/results/frankenstallm-3b-v2/ollama_benchmark_results.json`
+- **요약 MD**: `eval/results/frankenstallm-3b-v2/ollama_benchmark_summary.md`
+---
+## 4. 기존 ORPO 평가와의 연계
+- **ORPO 종합 보고서**: `reports/2026-03-09_ORPO_EVALUATION_REPORT.md`
+- **정량 스코어**: 63.7/100, 7/10 차원 통과, 최종 판정 **RETRY**
+- **v2 배포본**은 동일 ORPO 체크포인트에서 byte-fallback만 수정·GGUF 변환한 버전이며,
+  ORPO 지표(예: preference accuracy, reward margin)는 기존 보고서와 동일한 체크포인트 기준으로 유지됨.
+---
+## 5. 아티팩트 경로 정리
+| 용도 | 경로 |
+|------|------|
+| 수정된 HF 체크포인트 | `outputs/hf_checkpoint-best-fixed/` |
+| F16 GGUF | `outputs/gguf/frankenstallm-3b-v2-f16.gguf` |
+| Q4_K_M GGUF (Ollama 배포용) | `outputs/gguf/frankenstallm-3b-v2-Q4_K_M.gguf` |
+| Ollama 벤치마크 결과 | `eval/results/frankenstallm-3b-v2/` |
+| Byte-fallback 수정 스크립트 | `scripts/fix_tokenizer_byte_fallback.py` |
+---
+*이 보고서는 GGUF 변환·Ollama 배포 및 Ollama 벤치마크 결과를 정리한 문서입니다.*