anemll
/

ANEMLL-Prefill-bench

Model card Files Files and versions

ANEMLL-Prefill-bench / README.md

anemll's picture

Update README.md

29fa0a6 verified 9 months ago

|

history blame contribute delete

807 Bytes

	---
	license: mit
	---
	Model files for prefill benchmarks on Apple Neural Engine:

	https://docs.google.com/spreadsheets/d/1OCxn730D5h8rvS2IHsSi0UBYbsP_lV-W-0uVdVDCvIk

	ANEMLL 0.3.0-Alpha https://github.com/Anemll/Anemll/releases
	change mode="kmeans" to mode="uniform" for faster processing in llama_converter.py line 402
	Example export:
	./anemll/utils/convert_model.sh \
	--model ~/Models/HF/Llama-3.1-Nemotron-Nano-8B-v1 \
	--output ~/Models/ANE/anemll-Nemotron-8B-ch4-b512-w512 \
	--context 512 \
	--batch 512 \
	--lut1 "" \
	--lut2 4 \
	--lut3 "" \
	--chunk 4 --restart 4

	Source model: https://huggingface.co/nvidia/Llama-3.1-Nemotron-Nano-8B-v1
	Chuncks : https://huggingface.co/anemll/ANEMLL-Prefill-bench