Instructions to use UWV/wim-n5-phi4-mini-adapter with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use UWV/wim-n5-phi4-mini-adapter with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("unsloth/Phi-4-mini-instruct-bnb-4bit")
model = PeftModel.from_pretrained(base_model, "UWV/wim-n5-phi4-mini-adapter")

Transformers

How to use UWV/wim-n5-phi4-mini-adapter with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="UWV/wim-n5-phi4-mini-adapter", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("UWV/wim-n5-phi4-mini-adapter", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("UWV/wim-n5-phi4-mini-adapter", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use UWV/wim-n5-phi4-mini-adapter with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "UWV/wim-n5-phi4-mini-adapter"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "UWV/wim-n5-phi4-mini-adapter",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/UWV/wim-n5-phi4-mini-adapter

SGLang

How to use UWV/wim-n5-phi4-mini-adapter with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "UWV/wim-n5-phi4-mini-adapter" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "UWV/wim-n5-phi4-mini-adapter",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "UWV/wim-n5-phi4-mini-adapter" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "UWV/wim-n5-phi4-mini-adapter",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use UWV/wim-n5-phi4-mini-adapter with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for UWV/wim-n5-phi4-mini-adapter to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for UWV/wim-n5-phi4-mini-adapter to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for UWV/wim-n5-phi4-mini-adapter to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="UWV/wim-n5-phi4-mini-adapter",
    max_seq_length=2048,
)

Docker Model Runner
How to use UWV/wim-n5-phi4-mini-adapter with Docker Model Runner:
```
docker model run hf.co/UWV/wim-n5-phi4-mini-adapter
```

Phi-4-mini N5 Complaint Categorization Fine-tune

This model is a fine-tuned version of microsoft/Phi-4-mini-instruct optimized for categorizing citizen complaints into predefined topic and experience labels, trained on the signaalberichten dataset.

Model Details

Model Description

Developed by: UWV InnovatieHub
Model type: Causal Language Model with LoRA fine-tuning
Language(s): Dutch (nl)
License: MIT
Finetuned from: microsoft/Phi-4-mini-instruct (3.82B parameters)
Training Framework: Unsloth (optimized training for efficient processing)

Training Details

Dataset: UWV/wim_instruct_signaalberichten_to_jsonld_agent_steps
Dataset Size: 4,525 N5-specific examples (label addition tasks)
Training Duration: 1 hour 44 minutes
Hardware: NVIDIA A100 80GB
Epochs: 3.1
Steps: 1,735
Training Metrics:
- Final Training Loss: 0.7864
- Final Eval Loss: 0.7796
- Training samples/second: 2.209
- Learning rate (final): 6.26e-10

LoRA Configuration

{
    "r": 512,                    # Large rank for quality
    "lora_alpha": 1024,         # Alpha (2:1 ratio)
    "lora_dropout": 0.1,        # Higher dropout for small dataset
    "bias": "none",
    "task_type": "CAUSAL_LM",
    "target_modules": [
        "q_proj", "k_proj", "v_proj", "o_proj"  # Attention layers only
    ]
}

Training Configuration

{
    "model": "phi4-mini",
    "max_seq_length": 4096,
    "batch_size": 8,
    "gradient_accumulation_steps": 1,
    "effective_batch_size": 8,
    "learning_rate": 2e-5,
    "warmup_steps": 50,
    "max_grad_norm": 1.0,
    "lr_scheduler": "cosine",
    "optimizer": "paged_adamw_8bit",
    "bf16": True,
    "seed": 42
}

Intended Uses & Limitations

Intended Uses

Complaint Categorization: Classify citizen complaints into topic and experience categories
Municipal Service Analysis: Analyze phone transcripts and written complaints
Topic Detection: Identify what the complaint is about (e.g., waste, parking, permits)
Experience Analysis: Determine how citizens experience the service (e.g., communication, speed, clarity)

Limitations

Trained on signaalberichten dataset (Dutch municipal complaints)
Fixed label vocabulary (cannot create new labels)
Best performance on complaint/service interaction texts
Limited to 4K token context (sufficient for most complaints)
Specific to Dutch government/municipal contexts

How to Use

Option 1: Using the Merged Model (Recommended)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
import json

# Load the merged model (ready to use)
model = AutoModelForCausalLM.from_pretrained(
    "UWV/wim-n5-phi4-mini-merged",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("UWV/wim-n5-phi4-mini-merged")

# Prepare input - complaint text for categorization
complaint_text = """
Burger: Nou, waar ik dus over wil klagen is het afval in de buurt. 
Het is echt niet normaal meer, met al die vuilniszakken die op straat worden gegooid. 
De containers zijn vaak vol en er komen ook ratten. 
Ik had al eens gebeld maar er wordt niks aan gedaan!
"""

messages = [
    {
        "role": "system", 
        "content": "Jij bent een expert in het toewijzen van labels aan een tekst."
    },
    {
        "role": "user", 
        "content": f"""Analyseer de onderstaande tekst en bepaal welke labels van toepassing zijn.

**Onderwerp labels** (selecteer wat van toepassing is):
Vuil/ongedierte overlast, Bruikbaarheid/beschikbaarheid afvalcontainers, 
Parkeeroverlast, Vergunningen, etc.

**Beleving labels** (selecteer wat van toepassing is):
Communicatie, Op de hoogte houden, Statusinformatie, Snelheid van afhandeling, etc.

**Tekst om te analyseren**:
{complaint_text}"""
    }
]

# Apply chat template and generate
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=4096)
inputs = {k: v.to(model.device) for k, v in inputs.items()}

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=1000,
        temperature=0.1,  # Low temperature for consistent labeling
        do_sample=True,
        top_p=0.95,
        pad_token_id=tokenizer.pad_token_id,
        eos_token_id=tokenizer.eos_token_id,
    )

# Decode response
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
if "assistant:" in response:
    response = response.split("assistant:")[-1].strip()

print(response)

Option 2: Using the LoRA Adapter

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-4-mini-instruct",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

# Load adapter
model = PeftModel.from_pretrained(
    base_model,
    "UWV/wim-n5-phi4-mini-adapter"
)
tokenizer = AutoTokenizer.from_pretrained("UWV/wim-n5-phi4-mini-adapter")

# Use same inference code as above...

Expected Output Format

The model outputs a JSON response with categorization results:

{
    "reasoning": "Omdat de burger klaagt over afval dat op straat wordt gegooid, volle containers en rattenoverlast, zijn de onderwerpen 'Vuil/ongedierte overlast' en 'Bruikbaarheid/beschikbaarheid afvalcontainers' het meest van toepassing. De beleving is negatief: de burger ervaart frustratie over het uitblijven van actie en het gebrek aan terugkoppeling.",
    "onderwerp_labels": [
        "Vuil/ongedierte overlast",
        "Bruikbaarheid/beschikbaarheid afvalcontainers"
    ],
    "beleving_labels": [
        "Op de hoogte houden",
        "Statusinformatie",
        "Communicatie"
    ]
}

Dataset Information

The model was trained on the UWV/wim-instruct-signaalberichten-to-jsonld-agent-steps dataset, which contains:

Source: Signaalberichten (citizen complaints to municipalities)
Domain: Phone transcripts and written complaints about municipal services
N5 Examples: 4,525 complaint categorization tasks
Average Token Length: 1,636 tokens
Max Token Length: 2,332 tokens
Format: ChatML-formatted instruction-following examples
Task: Categorize complaints into predefined topic and experience labels

Important: This is a different task and dataset from the WIM pipeline (N1-N4) which focuses on Wikipedia to JSON-LD conversion.

Training Results

The model completed 3.1 epochs through the dataset:

Final Training Loss: 0.7864
Training Efficiency: 2.209 samples/second

Loss Progression

Started at ~1.13 loss
Rapid improvement in first epoch
Stable convergence throughout training
Final learning rate: 6.26e-10 (cosine decay)
Gradient norms: Stable around 0.6-0.7

Model Versions

Merged Model: UWV/wim-n5-phi4-mini-merged
- Note: Merge failed due to known Phi-4 issue
- Adapter weights saved instead
- Model works fine for inference
LoRA Adapter: UWV/wim-n5-phi4-mini-adapter (~2.29 GB)
- Requires base Phi-4-mini-instruct model
- Large adapter due to r=512
- Includes all training configurations

Model Context

Note: Despite the "n5" naming, this model is NOT part of the WIM (Wikipedia to Knowledge Graph) pipeline that includes N1-N4. This is a separate task focused on complaint categorization.

WIM Pipeline (Wikipedia to JSON-LD):

N1: Entity Extraction from Wikipedia text
N2: Schema.org Type Selection for entities
N3: Transform to JSON-LD format
N4: Validation of JSON-LD

This Model (N5 - Complaint Categorization):

Task: Categorize citizen complaints into topic and experience labels
Dataset: Signaalberichten (municipal complaints)
Domain: Government services and citizen interactions

Performance Characteristics

Sequence Length: Average 1,636 tokens (moderate length)
Batch Processing: Can handle batch size 8 with 4K context
Inference Speed: Fast label addition to existing JSON-LD
Memory Usage: ~10GB VRAM with 4K context
Domain: Specialized for Dutch government/municipal contexts

Citation

If you use this model, please cite:

@misc{wim-n5-phi4-mini,
  author = {UWV InnovatieHub},
  title = {Phi-4-mini N5 Complaint Categorization Model},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/UWV/wim-n5-phi4-mini-merged}
}