Instructions to use sangwon1472/gemma4-e2b-mud with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use sangwon1472/gemma4-e2b-mud with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="sangwon1472/gemma4-e2b-mud")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("sangwon1472/gemma4-e2b-mud")
model = AutoModelForImageTextToText.from_pretrained("sangwon1472/gemma4-e2b-mud")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

llama-cpp-python

How to use sangwon1472/gemma4-e2b-mud with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="sangwon1472/gemma4-e2b-mud",
	filename="gemma4-e2b-mud-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use sangwon1472/gemma4-e2b-mud with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf sangwon1472/gemma4-e2b-mud:UD-Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf sangwon1472/gemma4-e2b-mud:UD-Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf sangwon1472/gemma4-e2b-mud:UD-Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf sangwon1472/gemma4-e2b-mud:UD-Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf sangwon1472/gemma4-e2b-mud:UD-Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf sangwon1472/gemma4-e2b-mud:UD-Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf sangwon1472/gemma4-e2b-mud:UD-Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf sangwon1472/gemma4-e2b-mud:UD-Q4_K_M

Use Docker

docker model run hf.co/sangwon1472/gemma4-e2b-mud:UD-Q4_K_M

LM Studio
Jan

vLLM

How to use sangwon1472/gemma4-e2b-mud with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "sangwon1472/gemma4-e2b-mud"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sangwon1472/gemma4-e2b-mud",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/sangwon1472/gemma4-e2b-mud:UD-Q4_K_M

SGLang

How to use sangwon1472/gemma4-e2b-mud with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "sangwon1472/gemma4-e2b-mud" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sangwon1472/gemma4-e2b-mud",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "sangwon1472/gemma4-e2b-mud" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sangwon1472/gemma4-e2b-mud",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use sangwon1472/gemma4-e2b-mud with Ollama:
```
ollama run hf.co/sangwon1472/gemma4-e2b-mud:UD-Q4_K_M
```

Unsloth Studio new

How to use sangwon1472/gemma4-e2b-mud with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for sangwon1472/gemma4-e2b-mud to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for sangwon1472/gemma4-e2b-mud to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for sangwon1472/gemma4-e2b-mud to start chatting

Docker Model Runner
How to use sangwon1472/gemma4-e2b-mud with Docker Model Runner:
```
docker model run hf.co/sangwon1472/gemma4-e2b-mud:UD-Q4_K_M
```

Lemonade

How to use sangwon1472/gemma4-e2b-mud with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull sangwon1472/gemma4-e2b-mud:UD-Q4_K_M

Run and chat with the model

lemonade run user.gemma4-e2b-mud-UD-Q4_K_M

List all available models

lemonade list

gemma4-e2b-mud

File size: 12,246 Bytes

---
license: apache-2.0
language:
  - ko
library_name: transformers
pipeline_tag: text-generation
base_model:
  - google/gemma-4-E2B-it
tags:
  - gemma
  - gemma4
  - korean
  - roleplay
  - mud
  - lore
  - gguf
  - llama.cpp
  - lmstudio
  - transformers
---

# gemma4-e2b-mud

`gemma4-e2b-mud` 는 **Gemma 4 E2B 계열을 기반으로 한 한국어 우주항행 텍스트 MUD 스타일 모델**이다.  
짧은 명령 반응, NPC 대사, 방 분위기 묘사, 전승 설명, 항로 힌트 같은 **게임 서사 레이어**를 다루기 좋게 정리한 체크포인트와, 이를 재현하거나 확장할 수 있는 **Colab 스타터 패키지**, 그리고 **LM Studio용 GGUF**를 함께 제공한다.

이 저장소의 핵심 방향은 “게임 엔진의 판정”이 아니라, **말투, 분위기, 짧은 세계관 응답**을 AI가 맡도록 만드는 것이다.

## TL;DR

- 기반 모델: `google/gemma-4-E2B-it`
- 주 사용 언어: 한국어
- 주 사용 영역: 텍스트 MUD, NPC 대화, 룸 묘사, 전승/로어 설명
- 포함 형식:
  - Transformers 체크포인트
  - LM Studio / llama.cpp 용 `GGUF`
  - Colab / Unsloth 재현용 스타터 패키지
- 가장 잘 맞는 입력:
  - `talk oracle`
  - `look`
  - `rumor`
  - `scan signal`
  - `First Fire Horizon이 어떤 곳인지 설명해줘.`

## 이 저장소에 들어 있는 것

### 1. Transformers 체크포인트

루트에는 병합된 Hugging Face 형식 모델 가중치가 들어 있다.

- `model-00001-of-00005.safetensors`
- `model-00002-of-00005.safetensors`
- `model-00003-of-00005.safetensors`
- `model-00004-of-00005.safetensors`
- `model-00005-of-00005.safetensors`
- `config.json`
- `processor_config.json`
- `tokenizer.json`
- `tokenizer_config.json`
- `chat_template.jinja`

### 2. GGUF

LM Studio / llama.cpp 에 바로 가져갈 수 있도록 다음 GGUF 파일을 함께 제공한다.

- `gemma4-e2b-mud-Q4_K_M.gguf`

이 파일은 로컬 추론, LM Studio 테스트, 빠른 배포에 적합한 시작점이다.

### 3. Companion starter package

저장소 안의 [`gemma-mud-colab-starter/`](./gemma-mud-colab-starter) 폴더에는 다음이 포함되어 있다.

- Colab 노트북
- E2B / E4B 실험용 노트북 변형
- 예시 데이터셋
- LM Studio 시스템 프롬프트
- 테스트 프롬프트
- 실행 안내와 트러블슈팅 문서

즉 이 저장소는 “모델만 던져 놓은 형태”가 아니라, **실험과 배포까지 이어지는 패키지형 저장소**를 목표로 한다.

데이터셋 구조와 제작 방법은 별도 문서에 정리해 두었다.

- [`DATASET_GUIDE.md`](./DATASET_GUIDE.md)

## 모델 개요

이 모델은 **Gemma 4 E2B instruct 계열을 텍스트 MUD 톤에 맞게 다듬은 파생 모델**이다.

지향하는 응답은 아래와 같다.

- 짧고 선명한 NPC 대사
- 분위기가 살아 있는 방/장면 설명
- 세계관 고유명사를 중심으로 한 전승/로어 안내
- 플레이를 돕는 짧은 힌트
- 범용 비서형 답변보다, **세계 안에 머무르는 반응**

반대로 아래와 같은 방향은 피하도록 설계했다.

- 현실 기업/DB/상식 설명으로 새는 답변
- `AI로서...` 같은 메타 발언
- 장황한 백과사전식 설명
- 게임 상태를 모델이 임의로 확정하는 발화

## 기반 모델과 구조

이 저장소의 기반은 [`google/gemma-4-E2B-it`](https://huggingface.co/google/gemma-4-E2B-it) 이다.  
Gemma 4 E2B는 Google의 작은 Gemma 4 계열 모델이며, 공식 카드 기준으로 **Apache 2.0** 라이선스를 사용한다.

주의:

- **이 저장소 루트에 올라온 실제 가중치와 GGUF는 E2B 기반이다.**
- starter package 안에 들어 있는 `E4B` 노트북은 **E4B 실험/재학습용 companion notebook** 이며, 루트 체크포인트가 E4B라는 뜻은 아니다.

이 저장소의 로컬 `config.json` 기준 주요 텍스트 설정은 다음과 같다.

- 아키텍처: `Gemma4ForConditionalGeneration`
- `model_type`: `gemma4`
- 텍스트 hidden size: `1536`
- 텍스트 레이어 수: `35`
- attention heads: `8`
- key/value heads: `1`
- intermediate size: `6144`
- vocab size: `262144`
- sliding window: `512`

중요:

- **이 파생 모델의 실제 사용 초점은 텍스트 생성**이다.
- Gemma 4 E2B 계열 자체는 작은 멀티모달 계열이지만, 이 저장소의 스타터 패키지와 예시 프롬프트는 **텍스트 MUD 사용 시나리오**에 맞춰져 있다.
- 함께 들어 있는 GGUF 파일 역시 LM Studio에서 **텍스트 대화 중심**으로 바로 테스트하는 용도에 맞다.

## 어떤 작업에 잘 맞는가

### 잘 맞는 작업

- 텍스트 MUD NPC 대화
- 룸/장면 분위기 묘사
- 짧은 rumor / lore / signal 응답
- 로컬 스토리텔링 테스트
- 한국어 세계관 프로토타이핑
- MUD 엔진의 “서사 레이어” 분리 실험

### 특히 잘 맞는 입력 스타일

- 한두 단어 명령형
  - `talk oracle`
  - `look`
  - `rumor`
  - `scan signal`
- 짧은 설명 요청형
  - `First Fire Horizon이 어떤 곳인지 설명해줘.`
  - `Helios Verge의 의미와 중요성을 설명해줘.`

## 어떤 작업에는 맞지 않는가

이 모델은 아래 역할을 **직접 담당하지 않는 편이 안전하다.**

- 퀘스트 완료 판정
- 보상 지급
- 아이템/골드 정산
- 문 열림/잠금 등 상태 변경 확정
- 전투 승패 계산
- 법률, 의료, 금융 같은 고신뢰 정보 제공

즉, 추천 사용 구조는 다음과 같다.

- **엔진**: 이동, 전투, 상태, 보상, 퀘스트 로직
- **모델**: 대사, 분위기, 전승, 짧은 힌트, 묘사

## 빠른 시작

### Transformers 로 바로 쓰기

Gemma 4 공식 문서의 사용 방식에 맞춰 `AutoProcessor` + `AutoModelForImageTextToText` 경로로 시작할 수 있다.

```python
from transformers import AutoProcessor, AutoModelForImageTextToText
import torch

MODEL_ID = "sangwon1472/gemma4-e2b-mud"

processor = AutoProcessor.from_pretrained(MODEL_ID)
model = AutoModelForImageTextToText.from_pretrained(
    MODEL_ID,
    dtype="auto",
    device_map="auto",
)

messages = [
    {
        "role": "system",
        "content": "당신은 우주항행 텍스트 MUD의 항로 안내자이자 세계관 해설자다. 답변은 한국어로 한다."
    },
    {
        "role": "user",
        "content": "talk oracle"
    },
]

text = processor.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False,
)

inputs = processor(text=text, return_tensors="pt").to(model.device)
input_len = inputs["input_ids"].shape[-1]

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=128,
        temperature=1.0,
        top_p=0.95,
        top_k=64,
    )

response = processor.decode(outputs[0][input_len:], skip_special_tokens=False)
print(processor.parse_response(response))
```

## LM Studio / llama.cpp 사용

이 저장소에는 바로 import 가능한 GGUF 파일이 포함되어 있다.

- `gemma4-e2b-mud-Q4_K_M.gguf`

LM Studio 에서는 이 GGUF를 가져와서 바로 테스트할 수 있다.  
함께 제공하는 시스템 프롬프트와 테스트 문장은 starter package 안에 있다.

- [`lmstudio_system_prompt_ko.txt`](./gemma-mud-colab-starter/examples/lmstudio_system_prompt_ko.txt)
- [`lmstudio_system_prompt_npc_ko.txt`](./gemma-mud-colab-starter/examples/lmstudio_system_prompt_npc_ko.txt)
- [`lmstudio_system_prompt_lore_ko.txt`](./gemma-mud-colab-starter/examples/lmstudio_system_prompt_lore_ko.txt)
- [`lmstudio_test_prompts.md`](./gemma-mud-colab-starter/examples/lmstudio_test_prompts.md)

### 권장 시작 프롬프트

- `talk oracle`
- `look`
- `rumor`
- `scan signal`
- `First Fire Horizon이 어떤 곳인지 설명해줘.`

### 권장 생성 설정

Gemma 4 계열은 공식/실전 사용에서 아래 값이 무난한 출발점이다.

- `temperature = 1.0`
- `top_p = 0.95`
- `top_k = 64`

보다 짧고 안정적인 NPC 답변이 필요하면 아래처럼 조금 보수적으로 시작해도 좋다.

- `temperature = 0.7`
- `max tokens = 96 ~ 128`

위의 “더 짧게” 설정은 이 저장소의 MUD 용도에 맞춘 실전 권장값이다.

## Companion starter package 안내

함께 제공하는 [`gemma-mud-colab-starter/`](./gemma-mud-colab-starter) 폴더는 모델을 다시 학습하거나 변형해 보고 싶은 사용자를 위한 구성이다.

주요 파일:

- [`README.md`](./gemma-mud-colab-starter/README.md)
- [`run_instructions.md`](./gemma-mud-colab-starter/run_instructions.md)
- [`troubleshooting.md`](./gemma-mud-colab-starter/troubleshooting.md)
- [`Gemma4_MUD_QLoRA_Colab_Notebook-E2B.ipynb`](./gemma-mud-colab-starter/notebooks/Gemma4_MUD_QLoRA_Colab_Notebook-E2B.ipynb)
- [`Gemma4_MUD_QLoRA_Colab_Notebook-E4B.ipynb`](./gemma-mud-colab-starter/notebooks/Gemma4_MUD_QLoRA_Colab_Notebook-E4B.ipynb)

포함된 예시 데이터셋:

- `combined_1000.jsonl` : 1000행 원본
- `combined_1000.unsloth_chatml_dedup.jsonl` : 963행 정리본
- `combined_1000.unsloth_gemma4_messages_dedup.jsonl` : 963행 Gemma 4 messages 정리본
- `gemma4_mud_alpaca_100.jsonl` : 100행 소형 예시

데이터셋의 구조, 분포, 정제 방식, 새 샘플 작성 규칙은 별도 문서에서 자세히 다룬다.

- [`DATASET_GUIDE.md`](./DATASET_GUIDE.md)

중요:

- 위 데이터셋은 **starter/retraining/example 용도**로 함께 넣어 둔 것이다.
- 이 저장소의 체크포인트를 이해하거나 확장하는 데 도움을 주는 자료이며, 재현 실험을 위한 출발점으로 보는 편이 좋다.

## 응답 품질을 볼 때 좋은 기준

정답 문장을 외워 내는지가 핵심은 아니다.  
더 중요한 것은 **응답의 결**이다.

좋은 신호:

- 한국어로 답한다
- 세계관 안에 머문다
- 짧고 분위기가 있다
- 명령어별 반응 결이 다르다
- 현실 일반 상식 강의로 새지 않는다

경계 신호:

- `Hello!` 로 시작하는 범용 챗봇 톤
- `Oracle Database` 같은 현실 상식 반응
- `AI로서` 같은 메타 발언
- 엔진이 해야 할 상태 판정을 모델이 직접 선언

## 한계와 주의사항

이 모델은 특정 스타일과 용도에 맞춘 **도메인 적응형 모델**이다. 따라서 아래 한계를 이해하고 쓰는 것이 좋다.

- 세계관 톤은 강화되지만, 작은 도메인 적응 모델 특성상 **고유명사를 과장되게 이어붙이거나 분위기를 과생성**할 수 있다.
- 텍스트 MUD 서사에는 잘 맞지만, 일반 비서형 질의응답의 정확도를 목표로 하지는 않았다.
- 장황한 설명을 줄이기 위해 프롬프트와 시스템 메시지를 짧고 명확하게 유지하는 편이 좋다.
- 공식 벤치마크 기반 재평가나 별도 안전성 평가를 이 카드에서 제공하지는 않는다.

## 권장 사용 패턴

이 모델은 아래처럼 붙이면 가장 자연스럽다.

1. 플레이어 입력을 짧은 텍스트 프롬프트로 정리한다.
2. 엔진은 상태/판정을 처리한다.
3. 모델은 대사, 묘사, 힌트, 전승 설명을 생성한다.
4. 엔진은 모델 출력을 그대로 쓰기보다, 필요하면 후처리 규칙을 둔다.

특히 아래 선은 지키는 편이 좋다.

- **엔진이 결정해야 할 것**: 퀘스트 성공, 보상, 전투 판정, 문 상태
- **모델이 표현해도 좋은 것**: 분위기, 목소리, 여운, 단서, 문화, 신화

## 라이선스와 출처

- 파생 기반 모델: [`google/gemma-4-E2B-it`](https://huggingface.co/google/gemma-4-E2B-it)
- 라이선스 표기: `Apache-2.0`
- GGUF 및 starter package 는 이 저장소 배포 편의를 위해 함께 포함했다.

Gemma 4 자체의 구조와 기본 사용 방식은 공식 Gemma 4 카드와 Unsloth Gemma 4 문서를 참고하면 좋다.

- [Google Gemma 4 E2B model card](https://huggingface.co/google/gemma-4-E2B-it)
- [Unsloth Gemma 4 docs](https://unsloth.ai/docs/models/gemma-4)

## 한 줄 요약

`gemma4-e2b-mud` 는 **한국어 우주항행 텍스트 MUD의 분위기, NPC 대사, 전승 설명을 로컬에서도 빠르게 시험할 수 있게 만든 Gemma 4 E2B 파생 모델 + GGUF + Colab starter package 묶음**이다.