BoruiXu
/

phi3_mini_amd_NPU

Model card Files Files and versions

phi3_mini_amd_NPU / README.md

BoruiXu's picture

Update README.md

077d2cb verified over 1 year ago

|

history blame contribute delete

534 Bytes

	run phi3-mini on AMD NPU

	1. If no ```phi3_mini_awq_4bit_no_flash_attention.pt```, use awq quantization to get the quantization model.
	2. Put modeling_phi3.py in this repo into the phi-3-mini folder.
	3. Modify the file path in the run_awq.py
	4. run ```python run_awq.py --task decode --target aie --w_bit 4```


	reference:https://github.com/amd/RyzenAI-SW/tree/main/example/transformers

	As the quantization of phi-3, may refer to https://github.com/mit-han-lab/llm-awq/pull/183

	PS: The performance is similar to that on CPU(7640hs).