phi3_mini_amd_NPU / README.md

Update README.md

077d2cb verified over 1 year ago

534 Bytes

run phi3-mini on AMD NPU

If no phi3_mini_awq_4bit_no_flash_attention.pt, use awq quantization to get the quantization model.
Put modeling_phi3.py in this repo into the phi-3-mini folder.
Modify the file path in the run_awq.py
run python run_awq.py --task decode --target aie --w_bit 4

PS: The performance is similar to that on CPU(7640hs).