Instructions to use microsoft/Phi-3.5-MoE-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use microsoft/Phi-3.5-MoE-instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="microsoft/Phi-3.5-MoE-instruct", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3.5-MoE-instruct", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-3.5-MoE-instruct", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use microsoft/Phi-3.5-MoE-instruct with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "microsoft/Phi-3.5-MoE-instruct" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/Phi-3.5-MoE-instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/microsoft/Phi-3.5-MoE-instruct
- SGLang
How to use microsoft/Phi-3.5-MoE-instruct with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "microsoft/Phi-3.5-MoE-instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/Phi-3.5-MoE-instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "microsoft/Phi-3.5-MoE-instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/Phi-3.5-MoE-instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use microsoft/Phi-3.5-MoE-instruct with Docker Model Runner:
docker model run hf.co/microsoft/Phi-3.5-MoE-instruct
Update README.md
Browse files
README.md
CHANGED
|
@@ -247,6 +247,20 @@ Developers should apply responsible AI best practices, including mapping, measur
|
|
| 247 |
* Generation of Harmful Content: Developers should assess outputs for their context and use available safety classifiers or custom solutions appropriate for their use case.
|
| 248 |
* Misuse: Other forms of misuse such as fraud, spam, or malware production may be possible, and developers should ensure that their applications do not violate applicable laws and regulations.
|
| 249 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 250 |
## Software
|
| 251 |
* [PyTorch](https://github.com/pytorch/pytorch)
|
| 252 |
* [Transformers](https://github.com/huggingface/transformers)
|
|
|
|
| 247 |
* Generation of Harmful Content: Developers should assess outputs for their context and use available safety classifiers or custom solutions appropriate for their use case.
|
| 248 |
* Misuse: Other forms of misuse such as fraud, spam, or malware production may be possible, and developers should ensure that their applications do not violate applicable laws and regulations.
|
| 249 |
|
| 250 |
+
## Safety Evaluation and Red-Teaming
|
| 251 |
+
|
| 252 |
+
We leveraged various evaluation techniques including red teaming, adversarial conversation simulations, and multilingual safety evaluation benchmark datasets to
|
| 253 |
+
evaluate Phi-3.5 models' propensity to produce undesirable outputs across multiple languages and risk categories.
|
| 254 |
+
Several approaches were used to compensate for the limitations of one approach alone. Findings across the various evaluation methods indicate that safety
|
| 255 |
+
post-training that was done as detailed in the [Phi-3 Safety Post-Training paper](https://arxiv.org/pdf/2407.13833) had a positive impact across multiple languages and risk categories as observed by
|
| 256 |
+
refusal rates (refusal to output undesirable outputs) and robustness to jailbreak techniques. Note, however, while comprehensive red team evaluations were conducted
|
| 257 |
+
across all models in the prior release of Phi models, red teaming was largely focused on Phi-3.5 MOE across multiple languages and risk categories for this release as
|
| 258 |
+
it is the largest and more capable model of the three models. Details on prior red team evaluations across Phi models can be found in the [Phi-3 Safety Post-Training paper](https://arxiv.org/pdf/2407.13833).
|
| 259 |
+
For this release, insights from red teaming indicate that the models may refuse to generate undesirable outputs in English, even when the request for undesirable output
|
| 260 |
+
is in another language. Models may also be more susceptible to longer multi-turn jailbreak techniques across both English and non-English languages. These findings
|
| 261 |
+
highlight the need for industry-wide investment in the development of high-quality safety evaluation datasets across multiple languages, including low resource languages,
|
| 262 |
+
and risk areas that account for cultural nuances where those languages are spoken.
|
| 263 |
+
|
| 264 |
## Software
|
| 265 |
* [PyTorch](https://github.com/pytorch/pytorch)
|
| 266 |
* [Transformers](https://github.com/huggingface/transformers)
|