Instructions to use sangwon1472/gemma4-e2b-mud with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use sangwon1472/gemma4-e2b-mud with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="sangwon1472/gemma4-e2b-mud")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("sangwon1472/gemma4-e2b-mud")
model = AutoModelForImageTextToText.from_pretrained("sangwon1472/gemma4-e2b-mud")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

llama-cpp-python

How to use sangwon1472/gemma4-e2b-mud with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="sangwon1472/gemma4-e2b-mud",
	filename="gemma4-e2b-mud-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use sangwon1472/gemma4-e2b-mud with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf sangwon1472/gemma4-e2b-mud:UD-Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf sangwon1472/gemma4-e2b-mud:UD-Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf sangwon1472/gemma4-e2b-mud:UD-Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf sangwon1472/gemma4-e2b-mud:UD-Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf sangwon1472/gemma4-e2b-mud:UD-Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf sangwon1472/gemma4-e2b-mud:UD-Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf sangwon1472/gemma4-e2b-mud:UD-Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf sangwon1472/gemma4-e2b-mud:UD-Q4_K_M

Use Docker

docker model run hf.co/sangwon1472/gemma4-e2b-mud:UD-Q4_K_M

LM Studio
Jan

vLLM

How to use sangwon1472/gemma4-e2b-mud with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "sangwon1472/gemma4-e2b-mud"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sangwon1472/gemma4-e2b-mud",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/sangwon1472/gemma4-e2b-mud:UD-Q4_K_M

SGLang

How to use sangwon1472/gemma4-e2b-mud with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "sangwon1472/gemma4-e2b-mud" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sangwon1472/gemma4-e2b-mud",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "sangwon1472/gemma4-e2b-mud" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sangwon1472/gemma4-e2b-mud",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use sangwon1472/gemma4-e2b-mud with Ollama:
```
ollama run hf.co/sangwon1472/gemma4-e2b-mud:UD-Q4_K_M
```

Unsloth Studio new

How to use sangwon1472/gemma4-e2b-mud with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for sangwon1472/gemma4-e2b-mud to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for sangwon1472/gemma4-e2b-mud to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for sangwon1472/gemma4-e2b-mud to start chatting

Docker Model Runner
How to use sangwon1472/gemma4-e2b-mud with Docker Model Runner:
```
docker model run hf.co/sangwon1472/gemma4-e2b-mud:UD-Q4_K_M
```

Lemonade

How to use sangwon1472/gemma4-e2b-mud with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull sangwon1472/gemma4-e2b-mud:UD-Q4_K_M

Run and chat with the model

lemonade run user.gemma4-e2b-mud-UD-Q4_K_M

List all available models

lemonade list

sangwon1472 commited on Apr 10

Commit

49871e2

1 Parent(s): e5a8be6

Expand model card

Browse files

Files changed (1) hide show

README.md +330 -10

README.md CHANGED Viewed

@@ -1,18 +1,338 @@
 ---
 license: apache-2.0
 ---
 # gemma4-e2b-mud
-Gemma 4 E2B 기반 MUD 스타일 파인튜닝 모델
-## 특징
-- 텍스트 MUD 세계관 특화
-- 명령 기반 인터랙션
-- 서사 중심 출력
-## 사용법
-Transformers 또는 GGUF (LM Studio) 사용 가능
-## 포함 자료
-- `gemma-mud-colab-starter/`
-  Colab 학습 노트북, 데이터셋 예시, LM Studio 프롬프트, 실행 가이드를 함께 넣어둔 스타터 패키지

 ---
 license: apache-2.0
+language:
+  - ko
+library_name: transformers
+pipeline_tag: text-generation
+base_model:
+  - google/gemma-4-E2B-it
+tags:
+  - gemma
+  - gemma4
+  - korean
+  - roleplay
+  - mud
+  - lore
+  - gguf
+  - llama.cpp
+  - lmstudio
+  - transformers
 ---
 # gemma4-e2b-mud
+`gemma4-e2b-mud` 는 **Gemma 4 E2B 계열을 기반으로 한 한국어 우주항행 텍스트 MUD 스타일 모델**이다.
+짧은 명령 반응, NPC 대사, 방 분위기 묘사, 전승 설명, 항로 힌트 같은 **게임 서사 레이어**를 다루기 좋게 정리한 체크포인트와, 이를 재현하거나 확장할 수 있는 **Colab 스타터 패키지**, 그리고 **LM Studio용 GGUF**를 함께 제공한다.
+이 저장소의 핵심 방향은 “게임 엔진의 판정”이 아니라, **말투, 분위기, 짧은 세계관 응답**을 AI가 맡도록 만드는 것이다.
+## TL;DR
+- 기반 모델: `google/gemma-4-E2B-it`
+- 주 사용 언어: 한국어
+- 주 사용 영역: 텍스트 MUD, NPC 대화, 룸 묘사, 전승/로어 설명
+- 포함 형식:
+  - Transformers 체크포인트
+  - LM Studio / llama.cpp 용 `GGUF`
+  - Colab / Unsloth 재현용 스타터 패키지
+- 가장 잘 맞는 입력:
+  - `talk oracle`
+  - `look`
+  - `rumor`
+  - `scan signal`
+  - `First Fire Horizon이 어떤 곳인지 설명해줘.`
+## 이 저장소에 들어 있는 것
+### 1. Transformers 체크포인트
+루트에는 병합된 Hugging Face 형식 모델 가중치가 들어 있다.
+- `model-00001-of-00005.safetensors`
+- `model-00002-of-00005.safetensors`
+- `model-00003-of-00005.safetensors`
+- `model-00004-of-00005.safetensors`
+- `model-00005-of-00005.safetensors`
+- `config.json`
+- `processor_config.json`
+- `tokenizer.json`
+- `tokenizer_config.json`
+- `chat_template.jinja`
+### 2. GGUF
+LM Studio / llama.cpp 에 바로 가져갈 수 있도록 다음 GGUF 파일을 함께 제공한다.
+- `gemma4-e2b-mud-Q4_K_M.gguf`
+이 파일은 로컬 추론, LM Studio 테스트, 빠른 배포에 적합한 시작점이다.
+### 3. Companion starter package
+저장소 안의 [`gemma-mud-colab-starter/`](./gemma-mud-colab-starter) 폴더에는 다음이 포함되어 있다.
+- Colab 노트북
+- E2B / E4B 실험용 노트북 변형
+- 예시 데이터셋
+- LM Studio 시스템 프롬프트
+- 테스트 프롬프트
+- 실행 안내와 트러블슈팅 문서
+즉 이 저장소는 “모델만 던져 놓은 형태”가 아니라, **실험과 배포까지 이어지는 패키지형 저장소**를 목표로 한다.
+## 모델 개요
+이 모델은 **Gemma 4 E2B instruct 계열을 텍스트 MUD 톤에 맞게 다듬은 파생 모델**이다.
+지향하는 응답은 아래와 같다.
+- 짧고 선명한 NPC 대사
+- 분위기가 살아 있는 방/장면 설명
+- 세계관 고유명사를 중심으로 한 전승/로어 안내
+- 플레이를 돕는 짧은 힌트
+- 범용 비서형 답변보다, **세계 안에 머무르는 반응**
+반대로 아래와 같은 방향은 피하도록 설계했다.
+- 현실 기업/DB/상식 설명으로 새는 답변
+- `AI로서...` 같은 메타 발언
+- 장황한 백과사전식 설명
+- 게임 상태를 모델이 임의로 확정하는 발화
+## 기반 모델과 구조
+이 저장소의 기반은 [`google/gemma-4-E2B-it`](https://huggingface.co/google/gemma-4-E2B-it) 이다.
+Gemma 4 E2B는 Google의 작은 Gemma 4 계열 모델이며, 공식 카드 기준으로 **Apache 2.0** 라이선스를 사용한다.
+주의:
+- **이 저장소 루트에 올라온 실제 가중치와 GGUF는 E2B 기반이다.**
+- starter package 안에 들어 있는 `E4B` 노트북은 **E4B 실험/재학습용 companion notebook** 이며, 루트 체크포인트가 E4B라는 뜻은 아니다.
+이 저장소의 로컬 `config.json` 기준 주요 텍스트 설정은 다음과 같다.
+- 아키텍처: `Gemma4ForConditionalGeneration`
+- `model_type`: `gemma4`
+- 텍스트 hidden size: `1536`
+- 텍스트 레이어 수: `35`
+- attention heads: `8`
+- key/value heads: `1`
+- intermediate size: `6144`
+- vocab size: `262144`
+- sliding window: `512`
+중요:
+- **이 파생 모델의 실제 사용 초점은 텍스트 생성**이다.
+- Gemma 4 E2B 계열 자체는 작은 멀티모달 계열이지만, 이 저장소의 스타터 패키지와 예시 프롬프트는 **텍스트 MUD 사용 시나리오**에 맞춰져 있다.
+- 함께 들어 있는 GGUF 파일 역시 LM Studio에서 **텍스트 대화 중심**으로 바로 테스트하는 용도에 맞다.
+## 어떤 작업에 잘 맞는가
+### 잘 맞는 작업
+- 텍스트 MUD NPC 대화
+- 룸/장면 분위기 묘사
+- 짧은 rumor / lore / signal 응답
+- 로컬 스토리텔링 테스트
+- 한국어 세계관 프로토타이핑
+- MUD 엔진의 “서사 레이어” 분리 실험
+### 특히 잘 맞는 입력 스타일
+- 한두 단어 명령형
+  - `talk oracle`
+  - `look`
+  - `rumor`
+  - `scan signal`
+- 짧은 설명 요청형
+  - `First Fire Horizon이 어떤 곳인지 설명해줘.`
+  - `Helios Verge의 의미와 중요성을 설명해줘.`
+## 어떤 작업에는 맞지 않는가
+이 모델은 아래 역할을 **직접 담당하지 않는 편이 안전하다.**
+- 퀘스트 완료 판정
+- 보상 지급
+- 아이템/골드 정산
+- 문 열림/잠금 등 상태 변경 확정
+- 전투 승패 계산
+- 법률, 의료, 금융 같은 고신뢰 정보 제공
+즉, 추천 사용 구조는 다음과 같다.
+- **엔진**: 이동, 전투, 상태, 보상, 퀘스트 로직
+- **모델**: 대사, 분위기, 전승, 짧은 힌트, 묘사
+## 빠른 시작
+### Transformers 로 바로 쓰기
+Gemma 4 공식 문서의 사용 방식에 맞춰 `AutoProcessor` + `AutoModelForImageTextToText` 경로로 시작할 수 있다.
+```python
+from transformers import AutoProcessor, AutoModelForImageTextToText
+import torch
+MODEL_ID = "sangwon1472/gemma4-e2b-mud"
+processor = AutoProcessor.from_pretrained(MODEL_ID)
+model = AutoModelForImageTextToText.from_pretrained(
+    MODEL_ID,
+    dtype="auto",
+    device_map="auto",
+)
+messages = [
+    {
+        "role": "system",
+        "content": "당신은 우주항행 텍스트 MUD의 항로 안내자이자 세계관 해설자다. 답변은 한국어로 한다."
+    },
+    {
+        "role": "user",
+        "content": "talk oracle"
+    },
+]
+text = processor.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True,
+    enable_thinking=False,
+)
+inputs = processor(text=text, return_tensors="pt").to(model.device)
+input_len = inputs["input_ids"].shape[-1]
+with torch.no_grad():
+    outputs = model.generate(
+        **inputs,
+        max_new_tokens=128,
+        temperature=1.0,
+        top_p=0.95,
+        top_k=64,
+    )
+response = processor.decode(outputs[0][input_len:], skip_special_tokens=False)
+print(processor.parse_response(response))
+```
+## LM Studio / llama.cpp 사용
+이 저장소에는 바로 import 가능한 GGUF 파일이 포함되어 있다.
+- `gemma4-e2b-mud-Q4_K_M.gguf`
+LM Studio 에서는 이 GGUF를 가져와서 바로 테스트할 수 있다.
+함께 제공하는 시스템 프롬프트와 테스트 문장은 starter package 안에 있다.
+- [`lmstudio_system_prompt_ko.txt`](./gemma-mud-colab-starter/examples/lmstudio_system_prompt_ko.txt)
+- [`lmstudio_system_prompt_npc_ko.txt`](./gemma-mud-colab-starter/examples/lmstudio_system_prompt_npc_ko.txt)
+- [`lmstudio_system_prompt_lore_ko.txt`](./gemma-mud-colab-starter/examples/lmstudio_system_prompt_lore_ko.txt)
+- [`lmstudio_test_prompts.md`](./gemma-mud-colab-starter/examples/lmstudio_test_prompts.md)
+### 권장 시작 프롬프트
+- `talk oracle`
+- `look`
+- `rumor`
+- `scan signal`
+- `First Fire Horizon이 어떤 곳인지 설명해줘.`
+### 권장 생성 설정
+Gemma 4 계열은 공식/실전 사용에서 아래 값이 무난한 출발점이다.
+- `temperature = 1.0`
+- `top_p = 0.95`
+- `top_k = 64`
+보다 짧고 안정적인 NPC 답변이 필요하면 아래처럼 조금 보수적으로 시작해도 좋다.
+- `temperature = 0.7`
+- `max tokens = 96 ~ 128`
+위의 “더 짧게” 설정은 이 저장소의 MUD 용도에 맞춘 실전 권장값이다.
+## Companion starter package 안내
+함께 제공하는 [`gemma-mud-colab-starter/`](./gemma-mud-colab-starter) 폴더는 모델을 다시 학습하거나 변형해 보고 싶은 사용자를 위한 구성이다.
+주요 파일:
+- [`README.md`](./gemma-mud-colab-starter/README.md)
+- [`run_instructions.md`](./gemma-mud-colab-starter/run_instructions.md)
+- [`troubleshooting.md`](./gemma-mud-colab-starter/troubleshooting.md)
+- [`Gemma4_MUD_QLoRA_Colab_Notebook-E2B.ipynb`](./gemma-mud-colab-starter/notebooks/Gemma4_MUD_QLoRA_Colab_Notebook-E2B.ipynb)
+- [`Gemma4_MUD_QLoRA_Colab_Notebook-E4B.ipynb`](./gemma-mud-colab-starter/notebooks/Gemma4_MUD_QLoRA_Colab_Notebook-E4B.ipynb)
+포함된 예시 데이터셋:
+- `combined_1000.jsonl` : 1000행 원본
+- `combined_1000.unsloth_chatml_dedup.jsonl` : 963행 정리본
+- `combined_1000.unsloth_gemma4_messages_dedup.jsonl` : 963행 Gemma 4 messages 정리본
+- `gemma4_mud_alpaca_100.jsonl` : 100행 소형 예시
+중요:
+- 위 데이터셋은 **starter/retraining/example 용도**로 함께 넣어 둔 것이다.
+- 이 저장소의 체크포인트를 이해하거나 확장하는 데 도움을 주는 자료이며, 재현 실험을 위한 출발점으로 보는 편이 좋다.
+## 응답 품질을 볼 때 좋은 기준
+정답 문장을 외워 내는지가 핵심은 아니다.
+더 중요한 것은 **응답의 결**이다.
+좋은 신호:
+- 한국어로 답한다
+- 세계관 안에 머문다
+- 짧고 분위기가 있다
+- 명령어별 반응 결이 다르다
+- 현실 일반 상식 강의로 새지 않는다
+경계 신호:
+- `Hello!` 로 시작하는 범용 챗봇 톤
+- `Oracle Database` 같은 현실 상식 반응
+- `AI로서` 같은 메타 발언
+- 엔진이 해야 할 상태 판정을 모델��� 직접 선언
+## 한계와 주의사항
+이 모델은 특정 스타일과 용도에 맞춘 **도메인 적응형 모델**이다. 따라서 아래 한계를 이해하고 쓰는 것이 좋다.
+- 세계관 톤은 강화되지만, 작은 도메인 적응 모델 특성상 **고유명사를 과장되게 이어붙이거나 분위기를 과생성**할 수 있다.
+- 텍스트 MUD 서사에는 잘 맞지만, 일반 비서형 질의응답의 정확도를 목표로 하지는 않았다.
+- 장황한 설명을 줄이기 위해 프롬프트와 시스템 메시지를 짧고 명확하게 유지하는 편이 좋다.
+- 공식 벤치마크 기반 재평가나 별도 안전성 평가를 이 카드에서 제공하지는 않는다.
+## 권장 사용 패턴
+이 모델은 아래처럼 붙이면 가장 자연스럽다.
+1. 플레이어 입력을 짧은 텍스트 프롬프트로 정리한다.
+2. 엔진은 상태/판정을 처리한다.
+3. 모델은 대사, 묘사, 힌트, 전승 설명을 생성한다.
+4. 엔진은 모델 출력을 그대로 쓰기보다, 필요하면 후처리 규칙을 둔다.
+특히 아래 선은 지키는 편이 좋다.
+- **엔진이 결정해야 할 것**: 퀘스트 성공, 보상, 전투 판정, 문 상태
+- **모델이 표현해도 좋은 것**: 분위기, 목소리, 여운, 단서, 문화, 신화
+## 라이선스와 출처
+- 파생 기반 모델: [`google/gemma-4-E2B-it`](https://huggingface.co/google/gemma-4-E2B-it)
+- 라이선스 표기: `Apache-2.0`
+- GGUF 및 starter package 는 이 저장소 배포 편의를 위해 함께 포함했다.
+Gemma 4 자체의 구조와 기본 사용 방식은 공식 Gemma 4 카드와 Unsloth Gemma 4 문서를 참고하면 좋다.
+- [Google Gemma 4 E2B model card](https://huggingface.co/google/gemma-4-E2B-it)
+- [Unsloth Gemma 4 docs](https://unsloth.ai/docs/models/gemma-4)
+## 한 줄 요약
+`gemma4-e2b-mud` 는 **한국어 우주항행 텍스트 MUD의 분위기, NPC 대사, 전승 설명을 로컬에서도 빠르게 시험할 수 있게 만든 Gemma 4 E2B 파생 모델 + GGUF + Colab starter package 묶음**이다.