Update README.md
Browse files
README.md
CHANGED
|
@@ -1,4 +1,9 @@
|
|
| 1 |
run phi3-mini on AMD NPU
|
| 2 |
|
|
|
|
|
|
|
| 3 |
|
| 4 |
-
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
run phi3-mini on AMD NPU
|
| 2 |
|
| 3 |
+
1. Use awq quantization for the original model(get ```phi3_mini_awq_4bit_no_flash_attention.pt```).
|
| 4 |
+
2. run ```python run_awq.py --task decode --target aie --w_bit 4```
|
| 5 |
|
| 6 |
+
|
| 7 |
+
reference:https://github.com/amd/RyzenAI-SW/tree/main/example/transformers
|
| 8 |
+
|
| 9 |
+
As the quantization of phi-3, may refer to https://github.com/mit-han-lab/llm-awq/pull/183
|