Instructions to use microsoft/phi-1_5 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use microsoft/phi-1_5 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="microsoft/phi-1_5")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-1_5") model = AutoModelForCausalLM.from_pretrained("microsoft/phi-1_5") - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use microsoft/phi-1_5 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "microsoft/phi-1_5" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/phi-1_5", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/microsoft/phi-1_5
- SGLang
How to use microsoft/phi-1_5 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "microsoft/phi-1_5" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/phi-1_5", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "microsoft/phi-1_5" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/phi-1_5", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use microsoft/phi-1_5 with Docker Model Runner:
docker model run hf.co/microsoft/phi-1_5
configuration_mixformer_sequential.py deleted
I was trying to use Open-Orca/oo-phi-1_5 and I got an error: Entry Not Found for url: https://huggingface.co/microsoft/phi-1_5/resolve/main/configuration_mixformer_sequential.py.
Looking at the file history, this was deleted last week. I don't understand the transformer wrappers very well, but it looks like there was an attempt to improve the wrapper, but I think it broke it. I've also tried using microsoft/phi-1_5 and during inference, it gave a strange error: The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.
Setting pad_token_id to eos_token_id:50256 for open-end generation.
I was doing this on colab if anyone wants to see the full notebook: https://colab.research.google.com/drive/1o_fKb-P_2u-QwggQzzQPNaOj6-PkjVTb#scrollTo=3SGgTfikxC-z
@gugarosa Would you mind giving some pointers as to what is wrong?
Hello @tantanchen !
Regarding the issue with the Open-Orca/oo-phi-1_5 model, this looks like a problem related to the cache system. Could you please delete .cache and re-download that model? We updated our model interface, however, this only applies to microsoft/phi-1_5, i.e., other repositories should have their own model file.
Regarding the attention_mask warning, this is expected when it is used. Since the tokenizer we used for this model does not have a pad_token_id, we have to mimic a special token and use it as the padding token when doing batched inference/generation. In this case, it mimics the eos_token_id.
Hope this helps to clear some things up.
Best regards,
Gustavo.
hrrm that doesn't make much sense because I'm running this on Colab, and nothing is cached between sessions. But looks like the problem is on the Open-Ocra side. Thanks