Instructions to use UWV/wim-n2-phi4-mini-adapter with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use UWV/wim-n2-phi4-mini-adapter with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("unsloth/Phi-4-mini-instruct-bnb-4bit")
model = PeftModel.from_pretrained(base_model, "UWV/wim-n2-phi4-mini-adapter")

Transformers

How to use UWV/wim-n2-phi4-mini-adapter with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="UWV/wim-n2-phi4-mini-adapter", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("UWV/wim-n2-phi4-mini-adapter", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("UWV/wim-n2-phi4-mini-adapter", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use UWV/wim-n2-phi4-mini-adapter with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "UWV/wim-n2-phi4-mini-adapter"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "UWV/wim-n2-phi4-mini-adapter",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/UWV/wim-n2-phi4-mini-adapter

SGLang

How to use UWV/wim-n2-phi4-mini-adapter with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "UWV/wim-n2-phi4-mini-adapter" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "UWV/wim-n2-phi4-mini-adapter",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "UWV/wim-n2-phi4-mini-adapter" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "UWV/wim-n2-phi4-mini-adapter",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use UWV/wim-n2-phi4-mini-adapter with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for UWV/wim-n2-phi4-mini-adapter to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for UWV/wim-n2-phi4-mini-adapter to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for UWV/wim-n2-phi4-mini-adapter to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="UWV/wim-n2-phi4-mini-adapter",
    max_seq_length=2048,
)

Docker Model Runner
How to use UWV/wim-n2-phi4-mini-adapter with Docker Model Runner:
```
docker model run hf.co/UWV/wim-n2-phi4-mini-adapter
```

Phi-4-mini N2 Schema.org Retrieval Fine-tune

This model is a fine-tuned version of microsoft/Phi-4-mini-instruct optimized for Schema.org type selection from entity descriptions, trained as part of the WIM (Wikipedia to Knowledge Graph) pipeline.

Model Details

Model Description

Developed by: UWV InnovatieHub
Model type: Causal Language Model with LoRA fine-tuning
Language(s): Dutch (nl)
License: MIT
Finetuned from: microsoft/Phi-4-mini-instruct (3.82B parameters)
Training Framework: Unsloth (optimized training for efficient processing)

Training Details

Dataset: UWV/wim-instruct-wiki-to-jsonld-agent-steps
Dataset Size: 104,684 N2-specific examples (schema retrieval tasks)
Training Duration: 16 hours 33 minutes
Hardware: NVIDIA A100 80GB
Epochs: 1.56
Steps: 5,000
Training Metrics:
- Final Training Loss: 0.9303
- Final Eval Loss: 0.7903
- Training samples/second: 2.684
- Gradient norm (final): ~0.57

LoRA Configuration

{
    "r": 512,                    # Rank (same as N1 for consistency)
    "lora_alpha": 1024,         # Alpha (2:1 ratio)
    "lora_dropout": 0.05,       # Dropout for regularization
    "bias": "none",
    "task_type": "CAUSAL_LM",
    "target_modules": [
        "q_proj", "k_proj", "v_proj", "o_proj"  # Attention layers only
    ]
}

Training Configuration

{
    "model": "phi4-mini",
    "max_seq_length": 8192,
    "batch_size": 32,
    "gradient_accumulation_steps": 1,
    "effective_batch_size": 32,
    "learning_rate": 2e-5,
    "warmup_steps": 100,
    "max_grad_norm": 1.0,
    "lr_scheduler": "cosine",
    "optimizer": "paged_adamw_8bit",
    "bf16": True,
    "seed": 42
}

Intended Uses & Limitations

Intended Uses

Schema.org Type Selection: Select appropriate Schema.org types for entities
Knowledge Graph Construction: Second step (N2) in the WIM pipeline
Entity Classification: Map entity descriptions to standardized Schema.org vocabulary
High-throughput Processing: Optimized for batch processing with short sequences

Limitations

Optimized for Schema.org vocabulary only
Best performance on entity descriptions from encyclopedic content
Requires entity descriptions from N1 output
Limited to 8K token context (sufficient for all N2 examples)

How to Use

Option 1: Using the Merged Model (Recommended)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
import json

# Load the merged model (ready to use)
model = AutoModelForCausalLM.from_pretrained(
    "UWV/wim-n2-phi4-mini-merged",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("UWV/wim-n2-phi4-mini-merged")

# Prepare input (example from Dutch Wikipedia)
entities = [
    {
        "name": "Pedro Nunesplein",
        "description": "Een plein in Amsterdam genoemd naar Pedro Nunes"
    },
    {
        "name": "Amsterdam",
        "description": "Hoofdstad van Nederland"
    }
]

messages = [
    {
        "role": "system", 
        "content": "Je bent een expert in schema.org vocabulaire en semantische mapping."
    },
    {
        "role": "user", 
        "content": f"""Selecteer voor elke entiteit het meest passende Schema.org type:

{json.dumps(entities, ensure_ascii=False, indent=2)}

Geef een JSON array met elke entiteit en het Schema.org type."""
    }
]

# Apply chat template and generate
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=8192)
inputs = {k: v.to(model.device) for k, v in inputs.items()}

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=500,
        temperature=0.1,  # Low temperature for consistent classification
        do_sample=True,
        top_p=0.95,
        pad_token_id=tokenizer.pad_token_id,
        eos_token_id=tokenizer.eos_token_id,
    )

# Decode response
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
if "assistant:" in response:
    response = response.split("assistant:")[-1].strip()

print(response)

Option 2: Using the LoRA Adapter

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-4-mini-instruct",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

# Load adapter
model = PeftModel.from_pretrained(
    base_model,
    "UWV/wim-n2-phi4-mini-adapter"
)
tokenizer = AutoTokenizer.from_pretrained("UWV/wim-n2-phi4-mini-adapter")

# Use same inference code as above...

Expected Output Format

The model outputs JSON with Schema.org type selections:

[
  {
    "name": "Pedro Nunesplein",
    "schema_type": "Place",
    "schema_url": "https://schema.org/Place"
  },
  {
    "name": "Amsterdam", 
    "schema_type": "City",
    "schema_url": "https://schema.org/City"
  }
]

Dataset Information

The model was trained on the UWV/wim-instruct-wiki-to-jsonld-agent-steps dataset, which contains:

Source: Entity descriptions from N1 processing of Dutch Wikipedia
Processing: Multi-agent pipeline converting text to JSON-LD
N2 Examples: 104,684 schema selection tasks (largest subset)
Average Token Length: 663 tokens (very short sequences)
Max Token Length: 7,488 tokens
Format: ChatML-formatted instruction-following examples
Task: Select appropriate Schema.org types for entities

Training Results

The model completed 1.56 epochs through the large dataset:

Final Training Loss: 0.9303
Training Efficiency: 2.684 samples/second

Loss Progression

Started at ~0.77 loss
Stable training with gradual improvement
Learning rate: Cosine decay to 2e-12
Gradient norms: Stable around 0.5-0.7

Model Versions

Merged Model: UWV/wim-n2-phi4-mini-merged (7.17 GB)
- Ready to use without adapter loading
- Recommended for production inference
- Successfully merged (no Phi-4 issues)
LoRA Adapter: UWV/wim-n2-phi4-mini-adapter (~1.14 GB)
- Requires base Phi-4-mini-instruct model
- Useful for further fine-tuning or experiments
- Large adapter due to r=512 (same as N1)

Pipeline Context

This model is part of the WIM (Wikipedia to Knowledge Graph) pipeline:

N1: Entity Extraction
N2 (This Model): Schema.org Type Selection
N3: Transform to JSON-LD
N4: Validation
N5: Add Human-Readable Labels

N2 processes the largest number of examples (104K) but with the shortest sequences, making it highly efficient for batch processing. Despite using a larger LoRA configuration (r=512) than typically needed for this simpler task, the model trained efficiently and merged successfully.

Performance Characteristics

Sequence Length: Average 663 tokens (10x shorter than N1, 60x shorter than N3)
Batch Processing: Can handle batch size 32+ due to short sequences
Inference Speed: Very fast due to short context requirements
Memory Usage: ~11GB VRAM with 8K context

Citation

If you use this model, please cite:

@misc{wim-n2-phi4-mini,
  author = {UWV InnovatieHub},
  title = {Phi-4-mini N2 Schema.org Retrieval Model},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/UWV/wim-n2-phi4-mini-merged}
}