|
|
--- |
|
|
license: mit |
|
|
--- |
|
|
Model files for prefill benchmarks on Apple Neural Engine: |
|
|
|
|
|
https://docs.google.com/spreadsheets/d/1OCxn730D5h8rvS2IHsSi0UBYbsP_lV-W-0uVdVDCvIk |
|
|
|
|
|
ANEMLL 0.3.0-Alpha https://github.com/Anemll/Anemll/releases |
|
|
change mode="kmeans" to mode="uniform" for faster processing in llama_converter.py line 402 |
|
|
Example export: |
|
|
./anemll/utils/convert_model.sh \ |
|
|
--model ~/Models/HF/Llama-3.1-Nemotron-Nano-8B-v1 \ |
|
|
--output ~/Models/ANE/anemll-Nemotron-8B-ch4-b512-w512 \ |
|
|
--context 512 \ |
|
|
--batch 512 \ |
|
|
--lut1 "" \ |
|
|
--lut2 4 \ |
|
|
--lut3 "" \ |
|
|
--chunk 4 --restart 4 |
|
|
|
|
|
Source model: https://huggingface.co/nvidia/Llama-3.1-Nemotron-Nano-8B-v1 |
|
|
Chuncks : https://huggingface.co/anemll/ANEMLL-Prefill-bench |
|
|
|
|
|
|
|
|
|