tjellm commited on
Commit
5dfed34
·
verified ·
1 Parent(s): 9ab14d9

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +92 -0
README.md ADDED
@@ -0,0 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ pipeline_tag: text-generation
4
+ tags:
5
+ - ONNX
6
+ - DML
7
+ - ONNXRuntime
8
+ - phi3
9
+ - nlp
10
+ - conversational
11
+ - custom_code
12
+ inference: false
13
+ language:
14
+ - en
15
+ ---
16
+
17
+ # EmbeddedLLM/Phi-3-mini-128k-instruct-062024 ONNX
18
+
19
+ ## Model Summary
20
+
21
+ This model is an ONNX-optimized version of [microsoft/Phi-3-mini-128k-instruct (June 2024)](microsoft/Phi-3-mini-128k-instruct), designed to provide accelerated inference on a variety of hardware using ONNX Runtime(CPU and DirectML).
22
+ DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning, providing GPU acceleration for a wide range of supported hardware and drivers, including AMD, Intel, NVIDIA, and Qualcomm GPUs.
23
+
24
+ ## ONNX Models
25
+
26
+ Here are some of the optimized configurations we have added:
27
+ - **ONNX model for int4 DirectML:** ONNX model for AMD, Intel, and NVIDIA GPUs on Windows, quantized to int4 using AWQ.
28
+
29
+ ## Usage
30
+
31
+ ### Installation and Setup
32
+
33
+ To use the EmbeddedLLM/Phi-3-mini-128k-instruct-062024 ONNX model on Windows with DirectML, follow these steps:
34
+
35
+ 1. **Create and activate a Conda environment:**
36
+ ```sh
37
+ conda create -n onnx python=3.10
38
+ conda activate onnx
39
+ ```
40
+
41
+ 2. **Install Git LFS:**
42
+ ```sh
43
+ winget install -e --id GitHub.GitLFS
44
+ ```
45
+
46
+ 3. **Install Hugging Face CLI:**
47
+ ```sh
48
+ pip install huggingface-hub[cli]
49
+ ```
50
+
51
+ 4. **Download the model:**
52
+ ```sh
53
+ huggingface-cli download EmbeddedLLM/Phi-3-mini-128k-instruct-062024-onnx --include="onnx/directml/Phi-3-mini-128k-instruct-062024-int4/*" --local-dir .\Phi-3-mini-128k-instruct-062024-int4
54
+ ```
55
+
56
+ 5. **Install necessary Python packages:**
57
+ ```sh
58
+ pip install numpy==1.26.4
59
+ pip install onnxruntime-directml
60
+ pip install --pre onnxruntime-genai-directml==0.3.0
61
+ ```
62
+
63
+ 6. **Install Visual Studio 2015 runtime:**
64
+ ```sh
65
+ conda install conda-forge::vs2015_runtime
66
+ ```
67
+
68
+ 7. **Download the example script:**
69
+ ```sh
70
+ Invoke-WebRequest -Uri "https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/examples/python/phi3-qa.py" -OutFile "phi3-qa.py"
71
+ ```
72
+
73
+ 8. **Run the example script:**
74
+ ```sh
75
+ python phi3-qa.py -m .\Phi-3-mini-128k-instruct-062024-int4
76
+ ```
77
+
78
+ ### Hardware Requirements
79
+
80
+ **Minimum Configuration:**
81
+ - **Windows:** DirectX 12-capable GPU (AMD/Nvidia)
82
+ - **CPU:** x86_64 / ARM64
83
+ **Tested Configurations:**
84
+ - **GPU:** AMD Ryzen 8000 Series iGPU (DirectML)
85
+ - **CPU:** AMD Ryzen CPU
86
+
87
+ ## Model Description
88
+ - **Developed by:** Microsoft
89
+ - **Model type:** ONNX
90
+ - **Language(s) (NLP):** Python, C, C++
91
+ - **License:** Apache License Version 2.0
92
+ - **Model Description:** This model is a conversion of the Phi-3-mini-128k-instruct-062024 for ONNX Runtime inference, optimized for DirectML.