mud-judgment / README.md
rkevan's picture
Upload README.md with huggingface_hub
b60e902 verified
---
base_model: meta-llama/Llama-3.2-3B-Instruct
license: llama3.2
language:
- en
library_name: transformers
tags:
- llama
- gguf
- mud
- game-ai
- decision-making
- fine-tuned
- unsloth
- trl
- sft
model_name: mud-judgment
pipeline_tag: text-generation
quantized_by: llama.cpp
---
# mud-judgment — MUD Game Decision Engine (GGUF)
A fine-tuned Llama 3.2 3B Instruct model that makes real-time judgment calls for a bot playing [Apocalypse VI: Reborn](http://apocalypse-vi.com), a CircleMUD text game. The model handles decisions that scripted logic cannot: flee or fight, which path to take, whether to enter a dangerous area.
## Model Details
| Property | Value |
|----------|-------|
| **Base model** | `meta-llama/Llama-3.2-3B-Instruct` |
| **Fine-tuning method** | QLoRA via Unsloth (rank=16, alpha=32) |
| **Training framework** | TRL SFTTrainer, completion-only loss |
| **Training data** | ~594 hand-crafted JSONL examples across 4 decision categories |
| **Quantization** | Q4_K_M (1.9 GB) and Q8_0 (3.2 GB) via llama.cpp |
| **VRAM requirement** | ~3 GB (Q4_K_M), ~4.5 GB (Q8_0) |
| **Output format** | Single command + one-line reasoning |
## Files
| File | Size | Description |
|------|------|-------------|
| `mud-judgment-q4km.gguf` | 1.9 GB | Q4_K_M quantization (recommended for ≤6 GB VRAM) |
| `mud-judgment-q8.gguf` | 3.2 GB | Q8_0 quantization (higher quality, needs ~5 GB VRAM) |
| `Modelfile` | — | Ollama Modelfile with Llama 3.2 chat template |
| `system_prompt.txt` | — | Required system prompt (must be included in every call) |
## Quick Start — Ollama
```bash
# Download the GGUF and Modelfile, then:
ollama create mud-judgment -f Modelfile
# Call via API (system prompt is required):
curl -s http://localhost:11434/api/chat -d '{
"model": "mud-judgment",
"stream": false,
"messages": [
{"role": "system", "content": "<contents of system_prompt.txt>"},
{"role": "user", "content": "[SITUATION]\nDecision: COMBAT | Trigger: HP critical | State: 28hp 100mn 35mv | Level 7 | Buffs: none\n[/SITUATION]\n\nA forest wraith slashes YOU extremely hard.\nThat really did HURT!\nYour blood freezes as you hear a wraith'\''s death shriek."}
]
}'
```
Expected response:
```
flee
> HP critical at 28, wraith hitting extremely hard — cannot sustain this fight
```
## Quick Start — llama.cpp / Python
```bash
# llama.cpp CLI
llama-cli -m mud-judgment-q4km.gguf --temp 0.3 --top-p 0.9 \
-p "<|start_header_id|>system<|end_header_id|>\n\n<system prompt><|eot_id|><|start_header_id|>user<|end_header_id|>\n\n<situation><|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
```
```python
# Python with llama-cpp-python
from llama_cpp import Llama
llm = Llama(model_path="mud-judgment-q4km.gguf", n_ctx=2048, n_gpu_layers=-1)
response = llm.create_chat_completion(
messages=[
{"role": "system", "content": open("system_prompt.txt").read()},
{"role": "user", "content": situation_text},
],
temperature=0.3,
top_p=0.9,
)
print(response["choices"][0]["message"]["content"])
```
## Decision Types
The model handles 4 categories of judgment call:
| Type | When Called | Example Commands |
|------|------------|-----------------|
| **COMBAT** | HP critical, losing fight, buffs expired | `flee`, `recall`, `rebuff` |
| **NAVIGATION** | Stuck, maze, forced movement, no exits | `north`, `extract`, `maze`, `forced` |
| **RISK** | Unexplored exit, dangerous mob, death room | `continue`, `avoid`, `unavailable`, `hostile` |
| **RECOVERY** | Post-death, stuck, resource depletion | `urgent`, `rebuff`, `abandon`, `extract` |
## Input Format
Every user message must contain a `[SITUATION]` block:
```
[SITUATION]
Decision: RISK | Trigger: Unexplored exit | State: 94hp 177mn 68mv | Level 5 | Buffs: invis, sanc
[/SITUATION]
Standing at the edge of a deep crevasse...
One false step and you'd plunge into the darkness below.
There appears to be no chance of surviving the deadly fall.
[EXITS: North East *Down*]
```
## Output Format
Exactly two lines:
1. A single command (game command or script command)
2. A reasoning line prefixed with `>`
```
avoid
> Death room — crevasse with "no chance of surviving" language, flagging for safe exploration later
```
## Important Usage Notes
- **System prompt is mandatory.** The model was trained with the system prompt in every example. Without it, output quality degrades significantly.
- **Temperature 0.3** is recommended. Higher temperatures produce inconsistent formatting.
- **Do not use `ollama run` without setting the system prompt first** (`/set system <prompt>`). Use the chat API instead.
- **Modelfile must include the full Llama 3.2 chat template** — see the included `Modelfile` for the correct template.
## Training Details
- **Method:** QLoRA with Unsloth on WSL2 Ubuntu 24.04
- **GPU:** NVIDIA RTX 1000 Ada (6 GB VRAM) — training fits in ~4 GB
- **Epochs:** 2 (with 594 examples)
- **Learning rate:** 5e-5 with cosine scheduler
- **Effective batch size:** 8 (batch=1, grad_accum=8)
- **Eval loss:** 1.86 (steadily declining, no overfitting)
- **Loss type:** Completion-only (only trains on assistant response tokens)
- **LoRA targets:** q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
## Limitations
- Trained specifically for Apocalypse VI: Reborn game mechanics. May not generalize to other MUDs without additional training data.
- The 594-example training set covers common scenarios well but edge cases (ITEM, UNEXPECTED types) have minimal coverage.
- Quantization to Q4_K_M introduces slight quality loss vs. the full-precision LoRA adapter.
## Source Code
Training scripts, data generation, and the crawler that consumes this model are at:
[github.com/ninjarob/Apocalypse-VI-Projects](https://github.com/ninjarob/Apocalypse-VI-Projects)
## Citation
```bibtex
@misc{mud-judgment-2026,
title={mud-judgment: Fine-tuned Llama 3.2 3B for MUD Game Decision Making},
author={Robert Kevan},
year={2026},
url={https://huggingface.co/rkevan/mud-judgment}
}
```