anemll's picture
Update README.md
29fa0a6 verified
---
license: mit
---
Model files for prefill benchmarks on Apple Neural Engine:
https://docs.google.com/spreadsheets/d/1OCxn730D5h8rvS2IHsSi0UBYbsP_lV-W-0uVdVDCvIk
ANEMLL 0.3.0-Alpha https://github.com/Anemll/Anemll/releases
change mode="kmeans" to mode="uniform" for faster processing in llama_converter.py line 402
Example export:
./anemll/utils/convert_model.sh \
--model ~/Models/HF/Llama-3.1-Nemotron-Nano-8B-v1 \
--output ~/Models/ANE/anemll-Nemotron-8B-ch4-b512-w512 \
--context 512 \
--batch 512 \
--lut1 "" \
--lut2 4 \
--lut3 "" \
--chunk 4 --restart 4
Source model: https://huggingface.co/nvidia/Llama-3.1-Nemotron-Nano-8B-v1
Chuncks : https://huggingface.co/anemll/ANEMLL-Prefill-bench