Instructions to use microsoft/Phi-3-vision-128k-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use microsoft/Phi-3-vision-128k-instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="microsoft/Phi-3-vision-128k-instruct", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-3-vision-128k-instruct", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use microsoft/Phi-3-vision-128k-instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "microsoft/Phi-3-vision-128k-instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "microsoft/Phi-3-vision-128k-instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/microsoft/Phi-3-vision-128k-instruct

SGLang

How to use microsoft/Phi-3-vision-128k-instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "microsoft/Phi-3-vision-128k-instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "microsoft/Phi-3-vision-128k-instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "microsoft/Phi-3-vision-128k-instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "microsoft/Phi-3-vision-128k-instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use microsoft/Phi-3-vision-128k-instruct with Docker Model Runner:
```
docker model run hf.co/microsoft/Phi-3-vision-128k-instruct
```

How to enable streaming for phi 3 vision model ?

#15

by bhimrazy - opened May 24, 2024

Discussion

bhimrazy

May 24, 2024

I have developed an interface to chat with this model and was exploring how to stream the output.
https://lightning.ai/bhimrajyadav/studios/deploy-and-chat-with-phi-3-vision-128k-instruct

But I couldn't get it right.

sebbyjp

May 25, 2024

What have you tried?

dranger003

May 25, 2024

You can try this script: https://gist.github.com/dranger003/845739ac3a64f49d608e9bb39317dbf5

bhimrazy

May 27, 2024

Thanks @dranger003 for the script.

I used the existing TextIterabeStreamer and got it working.


#streaming
from threading import Thread
from transformers import TextIteratorStreamer
streamer = TextIteratorStreamer(processor.tokenizer,skip_prompt=True,skip_special_tokens=True,clean_up_tokenization_spaces=False)

# Run the generation in a separate thread, so that we can fetch the generated text in a non-blocking way.
generation_kwargs = dict(inputs, streamer=streamer, max_new_tokens=512, eos_token_id=processor.tokenizer.eos_token_id)
thread = Thread(target=model.generate, kwargs=generation_kwargs)
thread.start()

for text in streamer:
    print(text, end="", flush=True)

@sebbyjp , I was getting errors due to some parameter misconfiguration. Finally, it works now.

sebbyjp

May 27, 2024

Awesome! Are you able to run batched inference with image inputs?

dranger003

May 27, 2024

@bhimrazy Thanks, I didn't know about TextIteratorStreamer!

bhimrazy

Jun 2, 2024

Awesome! Are you able to run batched inference with image inputs?

Thank you for the feedback! I haven't had the chance to check out batched inference with image inputs yet, but I'll definitely look into it. I appreciate you bringing it to my attention.

By the way, I have a studio deployed that you can try out. Feel free to explore it here: Deploy and Chat with PHI 3 Vision 128K Instruct.

nguyenbh changed discussion status to closed Jun 17, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment