BoruiXu
/

phi3_mini_amd_NPU

Model card Files Files and versions

BoruiXu commited on Jul 4, 2024

Commit

8c06768

·

verified ·

1 Parent(s): 55a580f

Update README.md

Files changed (1) hide show

README.md +6 -1

README.md CHANGED Viewed

@@ -1,4 +1,9 @@
 run phi3-mini on AMD NPU
-reference:https://github.com/amd/RyzenAI-SW/tree/main/example/transformers

 run phi3-mini on AMD NPU
+1. Use awq quantization for the original model(get ```phi3_mini_awq_4bit_no_flash_attention.pt```).
+2. run ```python run_awq.py --task decode --target aie --w_bit 4```
+reference:https://github.com/amd/RyzenAI-SW/tree/main/example/transformers
+As the quantization of phi-3, may refer to https://github.com/mit-han-lab/llm-awq/pull/183