Instructions to use UWV/wim-n5-phi4-mini-adapter with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use UWV/wim-n5-phi4-mini-adapter with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/Phi-4-mini-instruct-bnb-4bit") model = PeftModel.from_pretrained(base_model, "UWV/wim-n5-phi4-mini-adapter") - Transformers
How to use UWV/wim-n5-phi4-mini-adapter with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="UWV/wim-n5-phi4-mini-adapter", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("UWV/wim-n5-phi4-mini-adapter", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("UWV/wim-n5-phi4-mini-adapter", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use UWV/wim-n5-phi4-mini-adapter with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "UWV/wim-n5-phi4-mini-adapter" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "UWV/wim-n5-phi4-mini-adapter", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/UWV/wim-n5-phi4-mini-adapter
- SGLang
How to use UWV/wim-n5-phi4-mini-adapter with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "UWV/wim-n5-phi4-mini-adapter" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "UWV/wim-n5-phi4-mini-adapter", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "UWV/wim-n5-phi4-mini-adapter" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "UWV/wim-n5-phi4-mini-adapter", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Unsloth Studio new
How to use UWV/wim-n5-phi4-mini-adapter with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for UWV/wim-n5-phi4-mini-adapter to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for UWV/wim-n5-phi4-mini-adapter to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for UWV/wim-n5-phi4-mini-adapter to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="UWV/wim-n5-phi4-mini-adapter", max_seq_length=2048, ) - Docker Model Runner
How to use UWV/wim-n5-phi4-mini-adapter with Docker Model Runner:
docker model run hf.co/UWV/wim-n5-phi4-mini-adapter
Phi-4-mini N5 Complaint Categorization Fine-tune
This model is a fine-tuned version of microsoft/Phi-4-mini-instruct optimized for categorizing citizen complaints into predefined topic and experience labels, trained on the signaalberichten dataset.
Model Details
Model Description
- Developed by: UWV InnovatieHub
- Model type: Causal Language Model with LoRA fine-tuning
- Language(s): Dutch (nl)
- License: MIT
- Finetuned from: microsoft/Phi-4-mini-instruct (3.82B parameters)
- Training Framework: Unsloth (optimized training for efficient processing)
Training Details
- Dataset: UWV/wim_instruct_signaalberichten_to_jsonld_agent_steps
- Dataset Size: 4,525 N5-specific examples (label addition tasks)
- Training Duration: 1 hour 44 minutes
- Hardware: NVIDIA A100 80GB
- Epochs: 3.1
- Steps: 1,735
- Training Metrics:
- Final Training Loss: 0.7864
- Final Eval Loss: 0.7796
- Training samples/second: 2.209
- Learning rate (final): 6.26e-10
LoRA Configuration
{
"r": 512, # Large rank for quality
"lora_alpha": 1024, # Alpha (2:1 ratio)
"lora_dropout": 0.1, # Higher dropout for small dataset
"bias": "none",
"task_type": "CAUSAL_LM",
"target_modules": [
"q_proj", "k_proj", "v_proj", "o_proj" # Attention layers only
]
}
Training Configuration
{
"model": "phi4-mini",
"max_seq_length": 4096,
"batch_size": 8,
"gradient_accumulation_steps": 1,
"effective_batch_size": 8,
"learning_rate": 2e-5,
"warmup_steps": 50,
"max_grad_norm": 1.0,
"lr_scheduler": "cosine",
"optimizer": "paged_adamw_8bit",
"bf16": True,
"seed": 42
}
Intended Uses & Limitations
Intended Uses
- Complaint Categorization: Classify citizen complaints into topic and experience categories
- Municipal Service Analysis: Analyze phone transcripts and written complaints
- Topic Detection: Identify what the complaint is about (e.g., waste, parking, permits)
- Experience Analysis: Determine how citizens experience the service (e.g., communication, speed, clarity)
Limitations
- Trained on signaalberichten dataset (Dutch municipal complaints)
- Fixed label vocabulary (cannot create new labels)
- Best performance on complaint/service interaction texts
- Limited to 4K token context (sufficient for most complaints)
- Specific to Dutch government/municipal contexts
How to Use
Option 1: Using the Merged Model (Recommended)
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
import json
# Load the merged model (ready to use)
model = AutoModelForCausalLM.from_pretrained(
"UWV/wim-n5-phi4-mini-merged",
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("UWV/wim-n5-phi4-mini-merged")
# Prepare input - complaint text for categorization
complaint_text = """
Burger: Nou, waar ik dus over wil klagen is het afval in de buurt.
Het is echt niet normaal meer, met al die vuilniszakken die op straat worden gegooid.
De containers zijn vaak vol en er komen ook ratten.
Ik had al eens gebeld maar er wordt niks aan gedaan!
"""
messages = [
{
"role": "system",
"content": "Jij bent een expert in het toewijzen van labels aan een tekst."
},
{
"role": "user",
"content": f"""Analyseer de onderstaande tekst en bepaal welke labels van toepassing zijn.
**Onderwerp labels** (selecteer wat van toepassing is):
Vuil/ongedierte overlast, Bruikbaarheid/beschikbaarheid afvalcontainers,
Parkeeroverlast, Vergunningen, etc.
**Beleving labels** (selecteer wat van toepassing is):
Communicatie, Op de hoogte houden, Statusinformatie, Snelheid van afhandeling, etc.
**Tekst om te analyseren**:
{complaint_text}"""
}
]
# Apply chat template and generate
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=4096)
inputs = {k: v.to(model.device) for k, v in inputs.items()}
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=1000,
temperature=0.1, # Low temperature for consistent labeling
do_sample=True,
top_p=0.95,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id,
)
# Decode response
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
if "assistant:" in response:
response = response.split("assistant:")[-1].strip()
print(response)
Option 2: Using the LoRA Adapter
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
"microsoft/Phi-4-mini-instruct",
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
# Load adapter
model = PeftModel.from_pretrained(
base_model,
"UWV/wim-n5-phi4-mini-adapter"
)
tokenizer = AutoTokenizer.from_pretrained("UWV/wim-n5-phi4-mini-adapter")
# Use same inference code as above...
Expected Output Format
The model outputs a JSON response with categorization results:
{
"reasoning": "Omdat de burger klaagt over afval dat op straat wordt gegooid, volle containers en rattenoverlast, zijn de onderwerpen 'Vuil/ongedierte overlast' en 'Bruikbaarheid/beschikbaarheid afvalcontainers' het meest van toepassing. De beleving is negatief: de burger ervaart frustratie over het uitblijven van actie en het gebrek aan terugkoppeling.",
"onderwerp_labels": [
"Vuil/ongedierte overlast",
"Bruikbaarheid/beschikbaarheid afvalcontainers"
],
"beleving_labels": [
"Op de hoogte houden",
"Statusinformatie",
"Communicatie"
]
}
Dataset Information
The model was trained on the UWV/wim-instruct-signaalberichten-to-jsonld-agent-steps dataset, which contains:
- Source: Signaalberichten (citizen complaints to municipalities)
- Domain: Phone transcripts and written complaints about municipal services
- N5 Examples: 4,525 complaint categorization tasks
- Average Token Length: 1,636 tokens
- Max Token Length: 2,332 tokens
- Format: ChatML-formatted instruction-following examples
- Task: Categorize complaints into predefined topic and experience labels
Important: This is a different task and dataset from the WIM pipeline (N1-N4) which focuses on Wikipedia to JSON-LD conversion.
Training Results
The model completed 3.1 epochs through the dataset:
- Final Training Loss: 0.7864
- Training Efficiency: 2.209 samples/second
Loss Progression
- Started at ~1.13 loss
- Rapid improvement in first epoch
- Stable convergence throughout training
- Final learning rate: 6.26e-10 (cosine decay)
- Gradient norms: Stable around 0.6-0.7
Model Versions
Merged Model:
UWV/wim-n5-phi4-mini-merged- Note: Merge failed due to known Phi-4 issue
- Adapter weights saved instead
- Model works fine for inference
LoRA Adapter:
UWV/wim-n5-phi4-mini-adapter(~2.29 GB)- Requires base Phi-4-mini-instruct model
- Large adapter due to r=512
- Includes all training configurations
Model Context
Note: Despite the "n5" naming, this model is NOT part of the WIM (Wikipedia to Knowledge Graph) pipeline that includes N1-N4. This is a separate task focused on complaint categorization.
WIM Pipeline (Wikipedia to JSON-LD):
- N1: Entity Extraction from Wikipedia text
- N2: Schema.org Type Selection for entities
- N3: Transform to JSON-LD format
- N4: Validation of JSON-LD
This Model (N5 - Complaint Categorization):
- Task: Categorize citizen complaints into topic and experience labels
- Dataset: Signaalberichten (municipal complaints)
- Domain: Government services and citizen interactions
Performance Characteristics
- Sequence Length: Average 1,636 tokens (moderate length)
- Batch Processing: Can handle batch size 8 with 4K context
- Inference Speed: Fast label addition to existing JSON-LD
- Memory Usage: ~10GB VRAM with 4K context
- Domain: Specialized for Dutch government/municipal contexts
Citation
If you use this model, please cite:
@misc{wim-n5-phi4-mini,
author = {UWV InnovatieHub},
title = {Phi-4-mini N5 Complaint Categorization Model},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/UWV/wim-n5-phi4-mini-merged}
}
- Downloads last month
- 3
Model tree for UWV/wim-n5-phi4-mini-adapter
Base model
microsoft/Phi-4-mini-instruct