File size: 534 Bytes
b1e3290 928f115 3bd77ad b1e3290 8c06768 077d2cb | 1 2 3 4 5 6 7 8 9 10 11 12 13 | run phi3-mini on AMD NPU
1. If no ```phi3_mini_awq_4bit_no_flash_attention.pt```, use awq quantization to get the quantization model.
2. Put modeling_phi3.py in this repo into the phi-3-mini folder.
3. Modify the file path in the run_awq.py
4. run ```python run_awq.py --task decode --target aie --w_bit 4```
reference:https://github.com/amd/RyzenAI-SW/tree/main/example/transformers
As the quantization of phi-3, may refer to https://github.com/mit-han-lab/llm-awq/pull/183
PS: The performance is similar to that on CPU(7640hs). |