Instructions to use microsoft/phi-1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use microsoft/phi-1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="microsoft/phi-1")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-1") model = AutoModelForCausalLM.from_pretrained("microsoft/phi-1") - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use microsoft/phi-1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "microsoft/phi-1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/phi-1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/microsoft/phi-1
- SGLang
How to use microsoft/phi-1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "microsoft/phi-1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/phi-1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "microsoft/phi-1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/phi-1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use microsoft/phi-1 with Docker Model Runner:
docker model run hf.co/microsoft/phi-1
Commit History
Update README.md 530294c
Upload 4 files b3ebf08
Update README.md 304b058
Update README.md 654b690
chore(readme): Updates with clear information. eac5218
Disables inference API to prevent mismatch with HF implementation. e8a38cd
fix(modeling_phi): Fixes initial generation with length larger than context length. f4e55a8
fix(modeling_phi): Fixes cached generation when above maximum context length. ecfe56e
Fixes exceeding maximum sequence length when using generate(). 759d148
Uses native torch decorator for disabling autocast. 5819d04
Adds disable_autocast support for different device types. 67ecc75
Fixes any potential overflow when calculating attention weights. b5c5161
Delete modeling_mixformer_sequential.py 470e18a
Delete configuration_mixformer_sequential.py bd98e4e
Upload pytorch_model.bin 34b22f4
Update to new model interface. bbace88
Improves type hinting on configuration arguments. 8d2c4ce
Fixes flash-attn import with a try/except statement 9ed5987
Adds support for flash-attn rotary embedding and fused dense layers. 90c38d9
Adds support for MQA/GQA and attention mask during training / fine-tuning. 371fd51
Upload modeling_mixformer_sequential.py 633bca1
Upload README.md 769684a
fix(phi-1): Checks length of `attention_mask`if it is passed as direct tensor. 1f890f7
Support for `attention_mask` in forward pass. d22f35e
Update README.md 621f844
Upload tokenizer 7a24267
Upload MixFormerSequentialForCausalLM 44cca9f
Update README.md 3034d33
Update README.md 1121e12
Update README.md 3e86fe1
Update generation_config.json a85c61b
Update generation_config.json cb13e96
Update generation_config.json a15ded7
Update README.md ba068cc
Update README.md 4cb33c4
Update README.md 3cf35a2
Update README.md 9e27d7d
Upload Research License.docx 046a667
Update README.md 3670ef4
Update README.md 9c0466b
Update README.md 9691b01
Upload tokenizer ebfa940
Upload MixFormerSequentialForCausalLM e96b200
Upload tokenizer 91817f9
Upload MixFormerSequentialForCausalLM 0f4ae0e
initial commit 47c069f
Gunasekar commited on