Instructions to use unsloth/Phi-4-mini-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use unsloth/Phi-4-mini-instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="unsloth/Phi-4-mini-instruct", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("unsloth/Phi-4-mini-instruct", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("unsloth/Phi-4-mini-instruct", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use unsloth/Phi-4-mini-instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "unsloth/Phi-4-mini-instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "unsloth/Phi-4-mini-instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/unsloth/Phi-4-mini-instruct

SGLang

How to use unsloth/Phi-4-mini-instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "unsloth/Phi-4-mini-instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "unsloth/Phi-4-mini-instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "unsloth/Phi-4-mini-instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "unsloth/Phi-4-mini-instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use unsloth/Phi-4-mini-instruct with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for unsloth/Phi-4-mini-instruct to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for unsloth/Phi-4-mini-instruct to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for unsloth/Phi-4-mini-instruct to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="unsloth/Phi-4-mini-instruct",
    max_seq_length=2048,
)

Docker Model Runner
How to use unsloth/Phi-4-mini-instruct with Docker Model Runner:
```
docker model run hf.co/unsloth/Phi-4-mini-instruct
```

Phi-4 mini does not work inside of unsloth.

by Pinkstack - opened Feb 27, 2025

Discussion

Pinkstack

Feb 27, 2025

The Phi-4 mini release is very promising, but sadly it cannot load inside of the Unsloth framework: "RuntimeError: rope_scaling's short_factor field must have length 64, got 48".

Will unsloth release a fixed version possibly?

justinj92

Feb 27, 2025

seems like the modelling_phi3.py is not included

shimmyshimmer

Unsloth AI org Feb 28, 2025

The Phi-4 mini release is very promising, but sadly it cannot load inside of the Unsloth framework: "RuntimeError: rope_scaling's short_factor field must have length 64, got 48".

Will unsloth release a fixed version possibly?

seems like the modelling_phi3.py is not included

unfortunately doesnt work atm in any framework. doesnt work in unsloth, ollama, llama.cpp etc because of new arch

will update u guys when it does

dinerburger

Feb 28, 2025

The architecture isn’t even particularly new, just none of these frameworks respect the “partial_rotary_factor” config line. (Only 3/4 of the embeddings are subject to RoPE, presumably to weight latest context more heavily than long context). I took a crack at adding it to Exllamav2, and while quantization now appears to work inference is wildly broken. Guess it’ll take a while before we see this in a usable state if all the upstream packages need to be updated to support it.

ykim362

Mar 1, 2025

•

edited Mar 1, 2025

Hello @shimmyshimmer

We already added this 'partial_rotary_factor' support to the latest HF and VLLM before the release.
The new model feature is added to the latest HF(v4.49.0) and vllm (v0.7.3) already.

Can you take a look at the PRs?
They are relatively simple if the new config is utilized.

VLLM: https://github.com/vllm-project/vllm/pull/12718
HF: https://github.com/huggingface/transformers/pull/35947

win10

Mar 6, 2025

Hello @shimmyshimmer

We already added this 'partial_rotary_factor' support to the latest HF and VLLM before the release.
The new model feature is added to the latest HF(v4.49.0) and vllm (v0.7.3) already.

Can you take a look at the PRs?
They are relatively simple if the new config is utilized.

VLLM: https://github.com/vllm-project/vllm/pull/12718
HF: https://github.com/huggingface/transformers/pull/35947

Can you guys prepare a complete fine-tuning solution for general users? I have tried a lot of methods and nothing works.
RuntimeError: rope_scaling's short_factor field must have length 64, got 48

chrisd37

Apr 6, 2025

I was getting this issue today on VLLM 0.8.3 serving Phi-4

Because my "vllm --version" reported 0.8.3 (which is more than 0.7.3, meaning my vLLM should be fine), I did "pip install transformers==0.49.0" as ykin362 suggested (although presented as a git pull) and now it works fine.

I had old transformers version 4.48.2 but that was not new enough. 0.49.0 (or higher) is required.

Pinkstack

Sep 16, 2025

Coming back to this, we were able to fine tune phi-4-mini by doing the following:
(full fine tuning and Lora also works)
After installing unsloth, if you are on Google colab notebook or any other jupyter notebook, make your second cell after your pip installs:

import os
os.environ["TORCH_DYNAMO_DISABLE"] = "1"
os.environ["UNSLOTH_COMPILE_DISABLE"] = "1"

And to start fine tuning
import torch
import torch.nn as nn
torch._dynamo.config.disable = True
trainer_stats = trainer.train()

These steps did allow us to fine-tune phi-4-mini models. keep in mind it does use quite a bit of memory but the loss is good

shimmyshimmer

Unsloth AI org Sep 16, 2025

•

edited Sep 16, 2025

Coming back to this, we were able to fine tune phi-4-mini by doing the following:
(full fine tuning and Lora also works)
After installing unsloth, if you are on Google colab notebook or any other jupyter notebook, make your second cell after your pip installs:

import os
os.environ["TORCH_DYNAMO_DISABLE"] = "1"
os.environ["UNSLOTH_COMPILE_DISABLE"] = "1"

And to start fine tuning
import torch
import torch.nn as nn
torch._dynamo.config.disable = True
trainer_stats = trainer.train()

These steps did allow us to fine-tune phi-4-mini models. keep in mind it does use quite a bit of memory but the loss is good

Thanks so much for the input, I'm sure people will find this useful! If there was only some way to pin this so others could know

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment