File size: 6,044 Bytes
e651b32
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b60e902
e651b32
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
---
base_model: meta-llama/Llama-3.2-3B-Instruct
license: llama3.2
language:
- en
library_name: transformers
tags:
- llama
- gguf
- mud
- game-ai
- decision-making
- fine-tuned
- unsloth
- trl
- sft
model_name: mud-judgment
pipeline_tag: text-generation
quantized_by: llama.cpp
---

# mud-judgment β€” MUD Game Decision Engine (GGUF)

A fine-tuned Llama 3.2 3B Instruct model that makes real-time judgment calls for a bot playing [Apocalypse VI: Reborn](http://apocalypse-vi.com), a CircleMUD text game. The model handles decisions that scripted logic cannot: flee or fight, which path to take, whether to enter a dangerous area.

## Model Details

| Property | Value |
|----------|-------|
| **Base model** | `meta-llama/Llama-3.2-3B-Instruct` |
| **Fine-tuning method** | QLoRA via Unsloth (rank=16, alpha=32) |
| **Training framework** | TRL SFTTrainer, completion-only loss |
| **Training data** | ~594 hand-crafted JSONL examples across 4 decision categories |
| **Quantization** | Q4_K_M (1.9 GB) and Q8_0 (3.2 GB) via llama.cpp |
| **VRAM requirement** | ~3 GB (Q4_K_M), ~4.5 GB (Q8_0) |
| **Output format** | Single command + one-line reasoning |

## Files

| File | Size | Description |
|------|------|-------------|
| `mud-judgment-q4km.gguf` | 1.9 GB | Q4_K_M quantization (recommended for ≀6 GB VRAM) |
| `mud-judgment-q8.gguf` | 3.2 GB | Q8_0 quantization (higher quality, needs ~5 GB VRAM) |
| `Modelfile` | β€” | Ollama Modelfile with Llama 3.2 chat template |
| `system_prompt.txt` | β€” | Required system prompt (must be included in every call) |

## Quick Start β€” Ollama

```bash
# Download the GGUF and Modelfile, then:
ollama create mud-judgment -f Modelfile

# Call via API (system prompt is required):
curl -s http://localhost:11434/api/chat -d '{
  "model": "mud-judgment",
  "stream": false,
  "messages": [
    {"role": "system", "content": "<contents of system_prompt.txt>"},
    {"role": "user", "content": "[SITUATION]\nDecision: COMBAT | Trigger: HP critical | State: 28hp 100mn 35mv | Level 7 | Buffs: none\n[/SITUATION]\n\nA forest wraith slashes YOU extremely hard.\nThat really did HURT!\nYour blood freezes as you hear a wraith'\''s death shriek."}
  ]
}'
```

Expected response:
```
flee
> HP critical at 28, wraith hitting extremely hard β€” cannot sustain this fight
```

## Quick Start β€” llama.cpp / Python

```bash
# llama.cpp CLI
llama-cli -m mud-judgment-q4km.gguf --temp 0.3 --top-p 0.9 \
  -p "<|start_header_id|>system<|end_header_id|>\n\n<system prompt><|eot_id|><|start_header_id|>user<|end_header_id|>\n\n<situation><|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
```

```python
# Python with llama-cpp-python
from llama_cpp import Llama

llm = Llama(model_path="mud-judgment-q4km.gguf", n_ctx=2048, n_gpu_layers=-1)
response = llm.create_chat_completion(
    messages=[
        {"role": "system", "content": open("system_prompt.txt").read()},
        {"role": "user", "content": situation_text},
    ],
    temperature=0.3,
    top_p=0.9,
)
print(response["choices"][0]["message"]["content"])
```

## Decision Types

The model handles 4 categories of judgment call:

| Type | When Called | Example Commands |
|------|------------|-----------------|
| **COMBAT** | HP critical, losing fight, buffs expired | `flee`, `recall`, `rebuff` |
| **NAVIGATION** | Stuck, maze, forced movement, no exits | `north`, `extract`, `maze`, `forced` |
| **RISK** | Unexplored exit, dangerous mob, death room | `continue`, `avoid`, `unavailable`, `hostile` |
| **RECOVERY** | Post-death, stuck, resource depletion | `urgent`, `rebuff`, `abandon`, `extract` |

## Input Format

Every user message must contain a `[SITUATION]` block:

```
[SITUATION]
Decision: RISK | Trigger: Unexplored exit | State: 94hp 177mn 68mv | Level 5 | Buffs: invis, sanc
[/SITUATION]

Standing at the edge of a deep crevasse...
One false step and you'd plunge into the darkness below.
There appears to be no chance of surviving the deadly fall.
[EXITS: North East *Down*]
```

## Output Format

Exactly two lines:
1. A single command (game command or script command)
2. A reasoning line prefixed with `>`

```
avoid
> Death room β€” crevasse with "no chance of surviving" language, flagging for safe exploration later
```

## Important Usage Notes

- **System prompt is mandatory.** The model was trained with the system prompt in every example. Without it, output quality degrades significantly.
- **Temperature 0.3** is recommended. Higher temperatures produce inconsistent formatting.
- **Do not use `ollama run` without setting the system prompt first** (`/set system <prompt>`). Use the chat API instead.
- **Modelfile must include the full Llama 3.2 chat template** β€” see the included `Modelfile` for the correct template.

## Training Details

- **Method:** QLoRA with Unsloth on WSL2 Ubuntu 24.04
- **GPU:** NVIDIA RTX 1000 Ada (6 GB VRAM) β€” training fits in ~4 GB
- **Epochs:** 2 (with 594 examples)
- **Learning rate:** 5e-5 with cosine scheduler
- **Effective batch size:** 8 (batch=1, grad_accum=8)
- **Eval loss:** 1.86 (steadily declining, no overfitting)
- **Loss type:** Completion-only (only trains on assistant response tokens)
- **LoRA targets:** q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

## Limitations

- Trained specifically for Apocalypse VI: Reborn game mechanics. May not generalize to other MUDs without additional training data.
- The 594-example training set covers common scenarios well but edge cases (ITEM, UNEXPECTED types) have minimal coverage.
- Quantization to Q4_K_M introduces slight quality loss vs. the full-precision LoRA adapter.

## Source Code

Training scripts, data generation, and the crawler that consumes this model are at:
[github.com/ninjarob/Apocalypse-VI-Projects](https://github.com/ninjarob/Apocalypse-VI-Projects)

## Citation

```bibtex
@misc{mud-judgment-2026,
  title={mud-judgment: Fine-tuned Llama 3.2 3B for MUD Game Decision Making},
  author={Robert Kevan},
  year={2026},
  url={https://huggingface.co/rkevan/mud-judgment}
}
```