Text Generation
Transformers
GGUF
English
llama
mud
game-ai
decision-making
fine-tuned
unsloth
trl
sft
conversational
Instructions to use rkevan/mud-judgment with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use rkevan/mud-judgment with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="rkevan/mud-judgment") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("rkevan/mud-judgment", dtype="auto") - llama-cpp-python
How to use rkevan/mud-judgment with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="rkevan/mud-judgment", filename="mud-judgment-q4km.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use rkevan/mud-judgment with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf rkevan/mud-judgment # Run inference directly in the terminal: llama-cli -hf rkevan/mud-judgment
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf rkevan/mud-judgment # Run inference directly in the terminal: llama-cli -hf rkevan/mud-judgment
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf rkevan/mud-judgment # Run inference directly in the terminal: ./llama-cli -hf rkevan/mud-judgment
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf rkevan/mud-judgment # Run inference directly in the terminal: ./build/bin/llama-cli -hf rkevan/mud-judgment
Use Docker
docker model run hf.co/rkevan/mud-judgment
- LM Studio
- Jan
- vLLM
How to use rkevan/mud-judgment with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "rkevan/mud-judgment" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rkevan/mud-judgment", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/rkevan/mud-judgment
- SGLang
How to use rkevan/mud-judgment with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "rkevan/mud-judgment" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rkevan/mud-judgment", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "rkevan/mud-judgment" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rkevan/mud-judgment", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use rkevan/mud-judgment with Ollama:
ollama run hf.co/rkevan/mud-judgment
- Unsloth Studio new
How to use rkevan/mud-judgment with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for rkevan/mud-judgment to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for rkevan/mud-judgment to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for rkevan/mud-judgment to start chatting
- Pi new
How to use rkevan/mud-judgment with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf rkevan/mud-judgment
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "rkevan/mud-judgment" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use rkevan/mud-judgment with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf rkevan/mud-judgment
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default rkevan/mud-judgment
Run Hermes
hermes
- Docker Model Runner
How to use rkevan/mud-judgment with Docker Model Runner:
docker model run hf.co/rkevan/mud-judgment
- Lemonade
How to use rkevan/mud-judgment with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull rkevan/mud-judgment
Run and chat with the model
lemonade run user.mud-judgment-{{QUANT_TAG}}List all available models
lemonade list
| base_model: meta-llama/Llama-3.2-3B-Instruct | |
| license: llama3.2 | |
| language: | |
| - en | |
| library_name: transformers | |
| tags: | |
| - llama | |
| - gguf | |
| - mud | |
| - game-ai | |
| - decision-making | |
| - fine-tuned | |
| - unsloth | |
| - trl | |
| - sft | |
| model_name: mud-judgment | |
| pipeline_tag: text-generation | |
| quantized_by: llama.cpp | |
| # mud-judgment — MUD Game Decision Engine (GGUF) | |
| A fine-tuned Llama 3.2 3B Instruct model that makes real-time judgment calls for a bot playing [Apocalypse VI: Reborn](http://apocalypse-vi.com), a CircleMUD text game. The model handles decisions that scripted logic cannot: flee or fight, which path to take, whether to enter a dangerous area. | |
| ## Model Details | |
| | Property | Value | | |
| |----------|-------| | |
| | **Base model** | `meta-llama/Llama-3.2-3B-Instruct` | | |
| | **Fine-tuning method** | QLoRA via Unsloth (rank=16, alpha=32) | | |
| | **Training framework** | TRL SFTTrainer, completion-only loss | | |
| | **Training data** | ~594 hand-crafted JSONL examples across 4 decision categories | | |
| | **Quantization** | Q4_K_M (1.9 GB) and Q8_0 (3.2 GB) via llama.cpp | | |
| | **VRAM requirement** | ~3 GB (Q4_K_M), ~4.5 GB (Q8_0) | | |
| | **Output format** | Single command + one-line reasoning | | |
| ## Files | |
| | File | Size | Description | | |
| |------|------|-------------| | |
| | `mud-judgment-q4km.gguf` | 1.9 GB | Q4_K_M quantization (recommended for ≤6 GB VRAM) | | |
| | `mud-judgment-q8.gguf` | 3.2 GB | Q8_0 quantization (higher quality, needs ~5 GB VRAM) | | |
| | `Modelfile` | — | Ollama Modelfile with Llama 3.2 chat template | | |
| | `system_prompt.txt` | — | Required system prompt (must be included in every call) | | |
| ## Quick Start — Ollama | |
| ```bash | |
| # Download the GGUF and Modelfile, then: | |
| ollama create mud-judgment -f Modelfile | |
| # Call via API (system prompt is required): | |
| curl -s http://localhost:11434/api/chat -d '{ | |
| "model": "mud-judgment", | |
| "stream": false, | |
| "messages": [ | |
| {"role": "system", "content": "<contents of system_prompt.txt>"}, | |
| {"role": "user", "content": "[SITUATION]\nDecision: COMBAT | Trigger: HP critical | State: 28hp 100mn 35mv | Level 7 | Buffs: none\n[/SITUATION]\n\nA forest wraith slashes YOU extremely hard.\nThat really did HURT!\nYour blood freezes as you hear a wraith'\''s death shriek."} | |
| ] | |
| }' | |
| ``` | |
| Expected response: | |
| ``` | |
| flee | |
| > HP critical at 28, wraith hitting extremely hard — cannot sustain this fight | |
| ``` | |
| ## Quick Start — llama.cpp / Python | |
| ```bash | |
| # llama.cpp CLI | |
| llama-cli -m mud-judgment-q4km.gguf --temp 0.3 --top-p 0.9 \ | |
| -p "<|start_header_id|>system<|end_header_id|>\n\n<system prompt><|eot_id|><|start_header_id|>user<|end_header_id|>\n\n<situation><|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n" | |
| ``` | |
| ```python | |
| # Python with llama-cpp-python | |
| from llama_cpp import Llama | |
| llm = Llama(model_path="mud-judgment-q4km.gguf", n_ctx=2048, n_gpu_layers=-1) | |
| response = llm.create_chat_completion( | |
| messages=[ | |
| {"role": "system", "content": open("system_prompt.txt").read()}, | |
| {"role": "user", "content": situation_text}, | |
| ], | |
| temperature=0.3, | |
| top_p=0.9, | |
| ) | |
| print(response["choices"][0]["message"]["content"]) | |
| ``` | |
| ## Decision Types | |
| The model handles 4 categories of judgment call: | |
| | Type | When Called | Example Commands | | |
| |------|------------|-----------------| | |
| | **COMBAT** | HP critical, losing fight, buffs expired | `flee`, `recall`, `rebuff` | | |
| | **NAVIGATION** | Stuck, maze, forced movement, no exits | `north`, `extract`, `maze`, `forced` | | |
| | **RISK** | Unexplored exit, dangerous mob, death room | `continue`, `avoid`, `unavailable`, `hostile` | | |
| | **RECOVERY** | Post-death, stuck, resource depletion | `urgent`, `rebuff`, `abandon`, `extract` | | |
| ## Input Format | |
| Every user message must contain a `[SITUATION]` block: | |
| ``` | |
| [SITUATION] | |
| Decision: RISK | Trigger: Unexplored exit | State: 94hp 177mn 68mv | Level 5 | Buffs: invis, sanc | |
| [/SITUATION] | |
| Standing at the edge of a deep crevasse... | |
| One false step and you'd plunge into the darkness below. | |
| There appears to be no chance of surviving the deadly fall. | |
| [EXITS: North East *Down*] | |
| ``` | |
| ## Output Format | |
| Exactly two lines: | |
| 1. A single command (game command or script command) | |
| 2. A reasoning line prefixed with `>` | |
| ``` | |
| avoid | |
| > Death room — crevasse with "no chance of surviving" language, flagging for safe exploration later | |
| ``` | |
| ## Important Usage Notes | |
| - **System prompt is mandatory.** The model was trained with the system prompt in every example. Without it, output quality degrades significantly. | |
| - **Temperature 0.3** is recommended. Higher temperatures produce inconsistent formatting. | |
| - **Do not use `ollama run` without setting the system prompt first** (`/set system <prompt>`). Use the chat API instead. | |
| - **Modelfile must include the full Llama 3.2 chat template** — see the included `Modelfile` for the correct template. | |
| ## Training Details | |
| - **Method:** QLoRA with Unsloth on WSL2 Ubuntu 24.04 | |
| - **GPU:** NVIDIA RTX 1000 Ada (6 GB VRAM) — training fits in ~4 GB | |
| - **Epochs:** 2 (with 594 examples) | |
| - **Learning rate:** 5e-5 with cosine scheduler | |
| - **Effective batch size:** 8 (batch=1, grad_accum=8) | |
| - **Eval loss:** 1.86 (steadily declining, no overfitting) | |
| - **Loss type:** Completion-only (only trains on assistant response tokens) | |
| - **LoRA targets:** q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj | |
| ## Limitations | |
| - Trained specifically for Apocalypse VI: Reborn game mechanics. May not generalize to other MUDs without additional training data. | |
| - The 594-example training set covers common scenarios well but edge cases (ITEM, UNEXPECTED types) have minimal coverage. | |
| - Quantization to Q4_K_M introduces slight quality loss vs. the full-precision LoRA adapter. | |
| ## Source Code | |
| Training scripts, data generation, and the crawler that consumes this model are at: | |
| [github.com/ninjarob/Apocalypse-VI-Projects](https://github.com/ninjarob/Apocalypse-VI-Projects) | |
| ## Citation | |
| ```bibtex | |
| @misc{mud-judgment-2026, | |
| title={mud-judgment: Fine-tuned Llama 3.2 3B for MUD Game Decision Making}, | |
| author={Robert Kevan}, | |
| year={2026}, | |
| url={https://huggingface.co/rkevan/mud-judgment} | |
| } | |
| ``` | |