Instructions to use microsoft/Phi-3-mini-4k-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use microsoft/Phi-3-mini-4k-instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="microsoft/Phi-3-mini-4k-instruct", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-3-mini-4k-instruct", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use microsoft/Phi-3-mini-4k-instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "microsoft/Phi-3-mini-4k-instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "microsoft/Phi-3-mini-4k-instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/microsoft/Phi-3-mini-4k-instruct

SGLang

How to use microsoft/Phi-3-mini-4k-instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "microsoft/Phi-3-mini-4k-instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "microsoft/Phi-3-mini-4k-instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "microsoft/Phi-3-mini-4k-instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "microsoft/Phi-3-mini-4k-instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use microsoft/Phi-3-mini-4k-instruct with Docker Model Runner:
```
docker model run hf.co/microsoft/Phi-3-mini-4k-instruct
```

Fine-tuning is not improving the domain knowledge? it is very complicated, could you help?

#50

by aaditya - opened May 1, 2024

Discussion

aaditya

May 1, 2024

Hi, Thank you for the awesome model, I really like the model output. I am trying to fine-tune the model for a domain specific use-case and using this qlora configuration:

sequence_len: 4000
sample_packing: true
pad_to_sequence_len: true
trust_remote_code: True
adapter: qlora
lora_r: 256
lora_alpha: 512
lora_dropout: 0.05
lora_target_linear: true
lora_target_modules:
  - q_proj
  - v_proj
  - k_proj
  - o_proj
  - gate_proj
  - down_proj
  - up_proj

gradient_accumulation_steps: 1
micro_batch_size: 2
num_epochs: 4
optimizer: adamw_torch
lr_scheduler: cosine
learning_rate: 0.00002
warmup_steps: 100
evals_per_epoch: 4
eval_table_size:
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0

Although the loss is going down , the plot looks like this

But while evaluating the performance is worse than the original model

Phi-3-mini-4k-instruct - Average domain accuracy : 40%
Qlora - Phi-3-mini-4k-instruct(with above config) : 35%

If there are any issues with the hyperparameters (e.g., learning rate), or Do you have some recommandations on how we can finetune this model?

gugarosa

Microsoft org May 1, 2024

Try lowering the sequence length you are using to tune the model, something like 2k.

We have seen several reports of the model going off the rails with extremely long prompts.

A combination of an “off the rail” instruct model + additional long-sequence fine tuning could be diminishing the performance.

aaditya

May 2, 2024

@gugarosa Update: I tried two epoch with 2k length with same config as above, as previously the loss went down but during evaluation the accuracy is worse than base model.

gugarosa

Microsoft org May 2, 2024

If it's possible, maybe try just a couple of steps with/without LoRA and see the performance comparison? Or even try disabling the dropout?

aaditya

May 3, 2024

@gugarosa I tried FFT, qlora, and Lora all three but the issue is the same the performance goes down while the loss is decreasing well.

gugarosa

Microsoft org May 3, 2024

Are you using a validation set during the training? Maybe it's something that we can track the performance on.

Since the loss is going down and the final performance is going down as well, there might some inflection point where the model is starting to overfit.

nguyenbh changed discussion status to closed Jul 1, 2024

eshanc

Aug 19, 2025

hey @aaditya you might find Impulse AI (https://www.impulselabs.ai/) useful. we make it super easy to fine-tune and deploy open source models. hopefully you find it helpful! i know not relevant to your problem above but might be easier to use us to fine tune and deploy

docs: https://docs.impulselabs.ai/introduction
python sdk: https://pypi.org/project/impulse-api-sdk-python/

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment