Instructions to use EryriLabs/PIT-MXFP4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use EryriLabs/PIT-MXFP4 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="EryriLabs/PIT-MXFP4")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("EryriLabs/PIT-MXFP4")
model = AutoModelForCausalLM.from_pretrained("EryriLabs/PIT-MXFP4")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use EryriLabs/PIT-MXFP4 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "EryriLabs/PIT-MXFP4"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EryriLabs/PIT-MXFP4",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/EryriLabs/PIT-MXFP4

SGLang

How to use EryriLabs/PIT-MXFP4 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "EryriLabs/PIT-MXFP4" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EryriLabs/PIT-MXFP4",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "EryriLabs/PIT-MXFP4" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EryriLabs/PIT-MXFP4",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use EryriLabs/PIT-MXFP4 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for EryriLabs/PIT-MXFP4 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for EryriLabs/PIT-MXFP4 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for EryriLabs/PIT-MXFP4 to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="EryriLabs/PIT-MXFP4",
    max_seq_length=2048,
)

Docker Model Runner
How to use EryriLabs/PIT-MXFP4 with Docker Model Runner:
```
docker model run hf.co/EryriLabs/PIT-MXFP4
```

PIT-MXFP4 - Police Interview Trainer

FOR TRAINING AND RESEARCH PURPOSES ONLY. Not for operational policing, legal advice, or use as evidence in any proceedings. The creator accepts no responsibility or liability for any use or misuse of this model. Model outputs may be inaccurate or incomplete.

Model Description

This is the MXFP4 quantised version of EryriLabs/PIT, reduced from ~39 GB to 13.8 GB with no noticeable quality degradation. MoE expert weights are quantised to MXFP4 while attention layers, router weights, embeddings, and the language model head remain in full precision.

PIT (Police Interview Trainer) is a domain-adapted language model for UK police interview roleplay training. It simulates realistic suspect behaviour across multiple scenario types, enabling trainee officers to practise the PEACE interview framework in a safe environment.

Base model: unsloth/gpt-oss-20b — a 21B parameter Mixture-of-Experts model with 3.6B active parameters per forward pass.

Training Pipeline

The model was created through a three-stage training pipeline:

1. Continued Pre-Training (CPT) — UK Criminal Law

Corpus: ~10.7 million tokens of UK criminal law material
Coverage: Legislation, case law, PACE codes, CPS guidance, sentencing guidelines
Adapter: LoRA r=64, 3 epochs, 1,971 steps

2. Continued Pre-Training (CPT) — Police Interview Technique

Corpus: ~53,000 tokens of PIP Level 1 interview training material
Coverage: PEACE framework, questioning techniques, suspect management, vulnerable persons
Adapter: LoRA r=32, 10 epochs, 80 steps
Stacked on: Stage 1 adapter

3. Supervised Fine-Tuning (SFT) — Interview Roleplay

Dataset: 523 examples across 6 interaction modes
Adapter: LoRA r=32, 3 epochs, 198 steps
Stacked on: Stage 1 + Stage 2 adapters

4. MXFP4 Export

All three adapter layers were reconstructed on the base model and merged using Unsloth's native save_pretrained_merged(save_method="mxfp4"). This quantises the MoE expert weights (gate_up_proj, down_proj) to MXFP4 while preserving attention, router, embeddings, and lm_head in original precision.

SFT Modes

Mode	Examples	Description
Suspect roleplay	200	In-character suspect responses (cooperative, deceptive, no-comment)
Assessment	120	Post-interview PIP Level 1 assessment feedback
PEACE knowledge	80	Direct Q&A about PEACE framework and interview law
Witness roleplay	60	In-character witness responses
Scenario presentation	33	Generating interview briefing scenarios
Special procedures	30	Handling vulnerable suspects, appropriate adults, mental health

Performance vs BF16

Tested on 2x NVIDIA RTX 3090 with device_map="auto":

Metric	BF16 (EryriLabs/PIT)	MXFP4 (this model)
Model size	~39 GB	13.8 GB
Load time	141s	7.6s
Generation speed	~13 tok/s	~16 tok/s
Quality	Baseline	Comparable (no noticeable degradation)

Quick Start with Docker

git clone https://huggingface.co/EryriLabs/PIT-MXFP4
cd PIT-MXFP4
# Or clone the application repo which includes the Docker setup

Using the full application (recommended)

The PIT application includes a web interface with scenario selection, interview simulation, transcript recording, and automated assessment.

cd pit-app
docker compose up

Then open http://localhost:3000.

Requirements:

Docker with NVIDIA Container Toolkit
GPU with 16GB+ VRAM (single GPU) or 2x 12GB+ GPUs
~14GB disk space

Using with Transformers directly

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "EryriLabs/PIT-MXFP4",
    dtype="auto",
    device_map="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("EryriLabs/PIT-MXFP4", trust_remote_code=True)

messages = [
    {"role": "system", "content": "You are PIT (Police Interview Trainer), simulating a suspect in a police interview training exercise.\n\nYOUR CHARACTER: Tyler Bennett, 23 years old, male.\nBEHAVIOUR: cooperative\n\nINSTRUCTIONS:\n- Stay in character throughout\n- Use natural everyday speech\n- Keep responses to 1-3 sentences"},
    {"role": "user", "content": "I am cautioning you. You do not have to say anything. But it may harm your defence if you do not mention when questioned something which you later rely on in court. Anything you do say may be given in evidence. Do you understand the caution?"}
]

input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7, do_sample=True)

response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)

Intended Use

Police interview training and education
Academic research into interview techniques
Roleplay simulation for PEACE framework practice
PIP Level 1 assessment preparation

Out of Scope

Operational policing decisions
Legal advice or guidance
Evidence in any legal proceedings
Replacement for human interview training supervision
Any commercial use without explicit permission

Technical Details

Architecture: Mixture-of-Experts (MoE), 21B total / 3.6B active parameters
Precision: MXFP4 (expert weights), bfloat16 (attention, router, embeddings, lm_head)
Training method: QLoRA (4-bit quantised base, 16-bit adapters)
Hardware: 2x NVIDIA RTX 3090 (24GB each)
Framework: Unsloth + HuggingFace Transformers

Disclaimer

THIS MODEL IS PROVIDED FOR TRAINING AND RESEARCH PURPOSES ONLY.

This model is not intended for, and should not be used in, operational policing, legal proceedings, or any context where its outputs could affect real individuals or cases. The model may generate inaccurate, incomplete, or inappropriate content. The creator accepts no responsibility or liability whatsoever for any use or misuse of this model or its outputs.

Users are solely responsible for ensuring their use complies with all applicable laws and regulations.

Training data might contain public sector information licensed under the Open Government Licence v3.0 and information licensed under the Non-Commercial College Licence.

License

CC-BY-NC-ND-4.0 (Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International)

Citation

@misc{eryrilabs2026pit,
  title={PIT: Police Interview Trainer (MXFP4)},
  author={EryriLabs},
  year={2026},
  publisher={HuggingFace},
  url={https://huggingface.co/EryriLabs/PIT-MXFP4}
}

Downloads last month: 5

Safetensors

Model size

22B params

Tensor type

BF16

Model tree for EryriLabs/PIT-MXFP4

Base model

openai/gpt-oss-20b

Quantized

unsloth/gpt-oss-20b

Finetuned

EryriLabs/PIT

Adapter

(1)

this model