--- base_model: meta-llama/Llama-3.2-3B-Instruct license: llama3.2 language: - en library_name: transformers tags: - llama - gguf - mud - game-ai - decision-making - fine-tuned - unsloth - trl - sft model_name: mud-judgment pipeline_tag: text-generation quantized_by: llama.cpp --- # mud-judgment — MUD Game Decision Engine (GGUF) A fine-tuned Llama 3.2 3B Instruct model that makes real-time judgment calls for a bot playing [Apocalypse VI: Reborn](http://apocalypse-vi.com), a CircleMUD text game. The model handles decisions that scripted logic cannot: flee or fight, which path to take, whether to enter a dangerous area. ## Model Details | Property | Value | |----------|-------| | **Base model** | `meta-llama/Llama-3.2-3B-Instruct` | | **Fine-tuning method** | QLoRA via Unsloth (rank=16, alpha=32) | | **Training framework** | TRL SFTTrainer, completion-only loss | | **Training data** | ~594 hand-crafted JSONL examples across 4 decision categories | | **Quantization** | Q4_K_M (1.9 GB) and Q8_0 (3.2 GB) via llama.cpp | | **VRAM requirement** | ~3 GB (Q4_K_M), ~4.5 GB (Q8_0) | | **Output format** | Single command + one-line reasoning | ## Files | File | Size | Description | |------|------|-------------| | `mud-judgment-q4km.gguf` | 1.9 GB | Q4_K_M quantization (recommended for ≤6 GB VRAM) | | `mud-judgment-q8.gguf` | 3.2 GB | Q8_0 quantization (higher quality, needs ~5 GB VRAM) | | `Modelfile` | — | Ollama Modelfile with Llama 3.2 chat template | | `system_prompt.txt` | — | Required system prompt (must be included in every call) | ## Quick Start — Ollama ```bash # Download the GGUF and Modelfile, then: ollama create mud-judgment -f Modelfile # Call via API (system prompt is required): curl -s http://localhost:11434/api/chat -d '{ "model": "mud-judgment", "stream": false, "messages": [ {"role": "system", "content": ""}, {"role": "user", "content": "[SITUATION]\nDecision: COMBAT | Trigger: HP critical | State: 28hp 100mn 35mv | Level 7 | Buffs: none\n[/SITUATION]\n\nA forest wraith slashes YOU extremely hard.\nThat really did HURT!\nYour blood freezes as you hear a wraith'\''s death shriek."} ] }' ``` Expected response: ``` flee > HP critical at 28, wraith hitting extremely hard — cannot sustain this fight ``` ## Quick Start — llama.cpp / Python ```bash # llama.cpp CLI llama-cli -m mud-judgment-q4km.gguf --temp 0.3 --top-p 0.9 \ -p "<|start_header_id|>system<|end_header_id|>\n\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n" ``` ```python # Python with llama-cpp-python from llama_cpp import Llama llm = Llama(model_path="mud-judgment-q4km.gguf", n_ctx=2048, n_gpu_layers=-1) response = llm.create_chat_completion( messages=[ {"role": "system", "content": open("system_prompt.txt").read()}, {"role": "user", "content": situation_text}, ], temperature=0.3, top_p=0.9, ) print(response["choices"][0]["message"]["content"]) ``` ## Decision Types The model handles 4 categories of judgment call: | Type | When Called | Example Commands | |------|------------|-----------------| | **COMBAT** | HP critical, losing fight, buffs expired | `flee`, `recall`, `rebuff` | | **NAVIGATION** | Stuck, maze, forced movement, no exits | `north`, `extract`, `maze`, `forced` | | **RISK** | Unexplored exit, dangerous mob, death room | `continue`, `avoid`, `unavailable`, `hostile` | | **RECOVERY** | Post-death, stuck, resource depletion | `urgent`, `rebuff`, `abandon`, `extract` | ## Input Format Every user message must contain a `[SITUATION]` block: ``` [SITUATION] Decision: RISK | Trigger: Unexplored exit | State: 94hp 177mn 68mv | Level 5 | Buffs: invis, sanc [/SITUATION] Standing at the edge of a deep crevasse... One false step and you'd plunge into the darkness below. There appears to be no chance of surviving the deadly fall. [EXITS: North East *Down*] ``` ## Output Format Exactly two lines: 1. A single command (game command or script command) 2. A reasoning line prefixed with `>` ``` avoid > Death room — crevasse with "no chance of surviving" language, flagging for safe exploration later ``` ## Important Usage Notes - **System prompt is mandatory.** The model was trained with the system prompt in every example. Without it, output quality degrades significantly. - **Temperature 0.3** is recommended. Higher temperatures produce inconsistent formatting. - **Do not use `ollama run` without setting the system prompt first** (`/set system `). Use the chat API instead. - **Modelfile must include the full Llama 3.2 chat template** — see the included `Modelfile` for the correct template. ## Training Details - **Method:** QLoRA with Unsloth on WSL2 Ubuntu 24.04 - **GPU:** NVIDIA RTX 1000 Ada (6 GB VRAM) — training fits in ~4 GB - **Epochs:** 2 (with 594 examples) - **Learning rate:** 5e-5 with cosine scheduler - **Effective batch size:** 8 (batch=1, grad_accum=8) - **Eval loss:** 1.86 (steadily declining, no overfitting) - **Loss type:** Completion-only (only trains on assistant response tokens) - **LoRA targets:** q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj ## Limitations - Trained specifically for Apocalypse VI: Reborn game mechanics. May not generalize to other MUDs without additional training data. - The 594-example training set covers common scenarios well but edge cases (ITEM, UNEXPECTED types) have minimal coverage. - Quantization to Q4_K_M introduces slight quality loss vs. the full-precision LoRA adapter. ## Source Code Training scripts, data generation, and the crawler that consumes this model are at: [github.com/ninjarob/Apocalypse-VI-Projects](https://github.com/ninjarob/Apocalypse-VI-Projects) ## Citation ```bibtex @misc{mud-judgment-2026, title={mud-judgment: Fine-tuned Llama 3.2 3B for MUD Game Decision Making}, author={Robert Kevan}, year={2026}, url={https://huggingface.co/rkevan/mud-judgment} } ```