---
license: mit
pipeline_tag: text-generation
tags:
- ONNX
- DML
- ONNXRuntime
- phi3
- nlp
- conversational
- custom_code
- DirectML
inference: false
language:
- en
---

# Phi-3-small-128k-instruct ONNX

This repository hosts the optimized versions of [microsoft/Phi-3-small-128k-instruct](https://huggingface.co/microsoft/Phi-3-small-128k-instruct) to accelerate inference with DirectML and ONNX Runtime.
The Phi-3-Small-128K-Instruct is a state-of-the-art, lightweight open model developed by Microsoft, featuring 7B parameters. 

Key Features:
- Parameter Count: 7B
- Tokenizer: Utilizes the tiktoken tokenizer for improved multilingual tokenization, with a vocabulary size of 100,352 tokens.
- Context Length: Default context length of 128k tokens.

Attention Mechanism:
- Implements grouped-query attention to minimize KV cache footprint, with 4 queries sharing 1 key.
- Uses alternative layers of dense attention and a novel blocksparse attention to further optimize on KV cache savings while maintaining long context retrieval performance.
- Multilingual Capability: Includes an additional 10% of multilingual data to enhance its performance across different languages.

## ONNX Models

Here are some of the optimized configurations we have added:
- **ONNX model for int4 DirectML:** ONNX model for AMD, Intel, and NVIDIA GPUs on Windows, quantized to int4 using AWQ.
- **ONNX model for int4 CPU and Mobile:** ONNX model for CPU and mobile using int4 quantization via RTN. There are two versions uploaded to balance latency vs. accuracy. Acc=1 is targeted at improved accuracy, while Acc=4 is for improved performance. For mobile devices, we recommend using the model with acc-level-4.

## Usage

### Installation and Setup

To use the Phi-3-small-128k-instruct ONNX model on Windows with DirectML, follow these steps:

1. **Create and activate a Conda environment:**
```sh
conda create -n onnx python=3.10
conda activate onnx
```

2. **Install Git LFS:**
```sh
winget install -e --id GitHub.GitLFS
```

3. **Install Hugging Face CLI:**
```sh
pip install huggingface-hub[cli]
```

4. **Download the model:**
```sh
huggingface-cli download EmbeddedLLM/Phi-3-small-128k-instruct-onnx --include="onnx/directml/*" --local-dir .\Phi-3-small-128k-instruct
```

5. **Install necessary Python packages:**
```sh
pip install numpy==1.26.4
pip install onnxruntime-directml
pip install --pre onnxruntime-genai-directml
```

6. **Install Visual Studio 2015 runtime:**
```sh
conda install conda-forge::vs2015_runtime
```

7. **Download the example script:**
```sh
Invoke-WebRequest -Uri "https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/examples/python/phi3-qa.py" -OutFile "phi3-qa.py"
```

8. **Run the example script:**
```sh
python phi3-qa.py -m .\Phi-3-small-128k-instruct
```

### Hardware Requirements

**Minimum Configuration:**
- **Windows:** DirectX 12-capable GPU (AMD/Nvidia/Intel)
- **CPU:** x86_64 / ARM64

**Tested Configurations:**
- **GPU:** AMD Ryzen 8000 Series iGPU (DirectML)
- **CPU:** AMD Ryzen CPU

## Hardware Supported

The model has been tested on:
- GPU SKU: RTX 4090 (DirectML)

Minimum Configuration Required:
- Windows: DirectX 12-capable GPU and a minimum of 10GB of combined RAM

### Model Description

- **Developed by:**  Microsoft
- **Model type:** ONNX
- **Language(s) (NLP):** Python, C, C++
- **License:** MIT
- **Model Description:** This is a conversion of the Phi-3 Small 128K Instruct model for ONNX Runtime inference.

## Additional Details
- [**Phi-3 Small, Medium, and Vision Blog**](https://aka.ms/phi3_ONNXBuild24)
- [**Phi-3 Model Blog Link**](https://aka.ms/phi3blog-april)
- [**Phi-3 Model Card**]( https://aka.ms/phi3-medium-4k-instruct)
- [**Phi-3 Technical Report**](https://aka.ms/phi3-tech-report)
- [**Phi-3 on Azure AI Studio**](https://aka.ms/phi3-azure-ai)
  
## License

The model is licensed under the [MIT license](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct/resolve/main/LICENSE).

## Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow [Microsoft’s Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks). Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies.