Instructions to use EryriLabs/dutybot-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use EryriLabs/dutybot-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="EryriLabs/dutybot-GGUF",
	filename="domain_adapted-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use EryriLabs/dutybot-GGUF with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf EryriLabs/dutybot-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf EryriLabs/dutybot-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf EryriLabs/dutybot-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf EryriLabs/dutybot-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf EryriLabs/dutybot-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf EryriLabs/dutybot-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf EryriLabs/dutybot-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf EryriLabs/dutybot-GGUF:Q4_K_M

Use Docker

docker model run hf.co/EryriLabs/dutybot-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use EryriLabs/dutybot-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "EryriLabs/dutybot-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EryriLabs/dutybot-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/EryriLabs/dutybot-GGUF:Q4_K_M

Ollama
How to use EryriLabs/dutybot-GGUF with Ollama:
```
ollama run hf.co/EryriLabs/dutybot-GGUF:Q4_K_M
```

Unsloth Studio

How to use EryriLabs/dutybot-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for EryriLabs/dutybot-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for EryriLabs/dutybot-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for EryriLabs/dutybot-GGUF to start chatting

How to use EryriLabs/dutybot-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf EryriLabs/dutybot-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "EryriLabs/dutybot-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use EryriLabs/dutybot-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf EryriLabs/dutybot-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default EryriLabs/dutybot-GGUF:Q4_K_M

Run Hermes

hermes

Atomic Chat new

OpenClaw new

How to use EryriLabs/dutybot-GGUF with OpenClaw:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf EryriLabs/dutybot-GGUF:Q4_K_M

Configure OpenClaw

# Install OpenClaw:
npm install -g openclaw@latest
# Register the local server and set it as the default model:
openclaw onboard --non-interactive --mode local \
  --auth-choice custom-api-key \
  --custom-base-url http://127.0.0.1:8080/v1 \
  --custom-model-id "EryriLabs/dutybot-GGUF:Q4_K_M" \
  --custom-provider-id llama-cpp \
  --custom-compatibility openai \
  --custom-text-input \
  --accept-risk \
  --skip-health

Run OpenClaw

openclaw agent --local --agent main --message "Hello from Hugging Face"

Docker Model Runner
How to use EryriLabs/dutybot-GGUF with Docker Model Runner:
```
docker model run hf.co/EryriLabs/dutybot-GGUF:Q4_K_M
```

Lemonade

How to use EryriLabs/dutybot-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull EryriLabs/dutybot-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.dutybot-GGUF-Q4_K_M

List all available models

lemonade list

DutyBot GGUF

A domain-adapted language model for UK policing — offences, points to prove, PACE powers, and operational guidance. Built for the DutyBot Docker application. For training and educational purposes only.

Model Details


Base model	unsloth/gpt-oss-20b
Architecture	Mixture of Experts — 21B total parameters, 3.6B active per token
Training method	Continued pretraining with QLoRA (rank 64, bf16)
Corpus	10,511 chunks (~10.7M tokens) of UK criminal law
Training loss	3.90 → 1.73
Context length	131,072 (native), trained at 1,024, tested at 4,096
Quantisation	Q4_K_M
File size	~14.7 GB
Chat template	ChatML (`<\|im_start\|>` / `<\|im_end\|>`)

How to Use

With DutyBot (recommended)

The easiest way to use this model is with the DutyBot Docker app which provides a full chat UI, conversation history, memory, and automatic legislation verification:

git clone https://github.com/dwain-barnes/dutybot.git
cd dutybot
docker compose up
# Open http://localhost:5000

With llama.cpp directly

# Download
huggingface-cli download EryriLabs/dutybot-GGUF domain_adapted-Q4_K_M.gguf --local-dir ./models
# Run server
llama-server \
  --model ./models/domain_adapted-Q4_K_M.gguf \
  --host 0.0.0.0 --port 8080 \
  --ctx-size 4096 \
  --n-gpu-layers 999 \
  --chat-template chatml

Then query the OpenAI-compatible API:

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "dutybot",
    "messages": [
      {"role": "system", "content": "You are DutyBot, a UK Police Duty Assistant for training purposes."},
      {"role": "user", "content": "What are the points to prove for Section 18 GBH?"}
    ],
    "max_tokens": 512,
    "temperature": 0.3,
    "stop": ["<|im_end|>", "<|im_start|>"],
    "frequency_penalty": 0.6
  }'

With Python (llama-cpp-python)

from llama_cpp import Llama
llm = Llama(
    model_path="./models/domain_adapted-Q4_K_M.gguf",
    n_ctx=4096,
    n_gpu_layers=-1,
    chat_format="chatml",
)
response = llm.create_chat_completion(
    messages=[
        {"role": "system", "content": "You are DutyBot, a UK Police Duty Assistant for training purposes."},
        {"role": "user", "content": "Explain the difference between ABH and GBH"},
    ],
    max_tokens=512,
    temperature=0.3,
    stop=["<|im_end|>", "<|im_start|>"],
    frequency_penalty=0.6,
)
print(response["choices"][0]["message"]["content"])

Training Details

Corpus

The training corpus covers UK criminal law across these domains:

Criminal offences — definitions, elements, and points to prove for offences under major UK statutes (Theft Act 1968, Offences Against the Person Act 1861, Criminal Damage Act 1971, Sexual Offences Act 2003, Misuse of Drugs Act 1971, and others)
PACE — Police and Criminal Evidence Act 1984 codes of practice (stop and search, arrest, detention, investigation, identification)
Sentencing — Sentencing Council guidelines and magistrates' court sentencing guidelines
CPS guidance — Crown Prosecution Service charging standards and legal guidance
Operational policing — powers, procedures, and general policing knowledge The corpus was structured as 10,511 text chunks, totalling approximately 10.7 million tokens.

Method

Continued pretraining (CPT) — the model was exposed to the full corpus to inject domain knowledge, rather than instruction-tuning for a specific format
QLoRA — 4-bit quantised base weights with rank-64 LoRA adapters in bf16, reducing GPU memory requirements
Hyperparameters:
- Learning rate: 5e-5 with cosine schedule
- Batch size: 1 with 16 gradient accumulation steps (effective batch 16)
- Sequence length: 1,024 tokens
- Epochs: 3
- Total steps: 1,971
Hardware: 2x NVIDIA RTX 3090 (24GB each)
Software: Unsloth + HuggingFace Transformers + TRL

Loss Curve

Step	Training Loss
0	3.90
100	1.94
500	1.80
670	1.73
1000	~1.65
The loss showed healthy, monotonic decline indicating successful knowledge injection without catastrophic forgetting.

Intended Use

In scope

Police training exercises and scenario planning
Educational materials about UK criminal law
Studying offence definitions, points to prove, and powers
Building training tools for police forces and law enforcement academies

Out of scope

Live operational policing decisions — this model is not a substitute for professional legal advice, force policy, or the judgement of trained officers
Legal advice — the model may produce inaccurate or incomplete legal information
Jurisdictions outside England & Wales — the training data is primarily based on English and Welsh law; Scottish and Northern Irish law differ significantly

Limitations

May fabricate legal definitions — like all language models, DutyBot can generate plausible-sounding but incorrect legal information. Always verify against official sources.
Training data currency — the corpus reflects law as of the training date. Legislation changes frequently.
Repetition — the model can sometimes repeat itself, especially on longer generations. Using frequency_penalty: 0.6 and max_tokens: 512 helps mitigate this.
No case law — the training data focuses on statute law and guidance rather than case law precedents.

System Prompt

For best results, use this system prompt:

You are DutyBot, a UK Police Duty Assistant. You help police officers with
operational guidance, definitions of offences, points to prove, and general
policing knowledge based on UK law.
IMPORTANT CONSTRAINTS:
- You are for TRAINING AND EDUCATIONAL PURPOSES ONLY — never for live operational use
- Always encourage officers to verify guidance against local force policy and official sources
- Be professional, precise, and cite legislation where possible
- If unsure, say so clearly — never fabricate legal definitions
- When legislation lookup results are provided, use them to ground your answer

Recommended Inference Parameters

Parameter	Value	Notes
`temperature`	0.3	Low temperature for factual responses
`max_tokens`	512	Prevents repetition on long outputs
`frequency_penalty`	0.6	Reduces repetitive phrasing
`presence_penalty`	0.3	Encourages topic diversity
`stop`	`["<\|im_end\|>", "<\|im_start\|>"]`	Proper turn boundaries
`ctx_size`	4096	Good balance of context and speed

Hardware Requirements

Setup	VRAM	Speed
2x RTX 3090 (full offload)	~16GB total	Fast
1x RTX 3090/4090 (partial offload)	24GB	Moderate
CPU only	0 (uses RAM)	Slow (~1-2 tok/s)
Minimum 16GB system RAM recommended. The GGUF file itself is 14.7GB.

Citation

@misc{dutybot2026,
  title={DutyBot: A Domain-Adapted Language Model for UK Police Training},
  author={EryriLabs},
  year={2026},
  url={https://huggingface.co/EryriLabs/dutybot-GGUF}
}

Disclaimer

This model and associated software are provided strictly for research and educational purposes only. They are not intended for production use, operational deployment, or commercial purposes.

No warranty: This model is provided "as is", without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose, or non-infringement.
No liability: The author(s) accept no responsibility or liability for any errors, omissions, or outcomes arising from the use of this model or its outputs.
Not legal advice: Nothing produced by this model constitutes legal, professional, or operational advice. Outputs may be inaccurate, incomplete, or outdated. Always consult qualified professionals and official sources.
Not for operational policing: This model must not be used for live operational decision-making. It is not a substitute for professional judgement, force policy, or official legal guidance.
Non-commercial use only: The model weights are licensed under CC-BY-NC-ND-4.0 and must not be used for commercial purposes.
Use at your own risk: You are solely responsible for how you use this model and any decisions made based on its output.

License

CC-BY-NC-ND-4.0 — Non-commercial use only. No derivatives without permission. The training corpus contains Crown copyright material used under the Open Government Licence.

Acknowledgements

GPT-OSS 20B base model
Unsloth for efficient QLoRA training
llama.cpp for GGUF inference
UK legislation sourced from legislation.gov.uk

Downloads last month: 3

GGUF

Model size

21B params

Architecture

gpt-oss

Hardware compatibility

4-bit

Model tree for EryriLabs/dutybot-GGUF

Base model

openai/gpt-oss-20b

Quantized

unsloth/gpt-oss-20b

Quantized

(21)

this model