AngelSlim
/

HY-1.8B-2Bit

@@ -17,14 +17,12 @@ Dedicated to building a more intuitive, comprehensive, and efficient LLMs compre
 </h3>
 <p align="center">
-          📣 <a href="https://huggingface.co/AngelSlim/HY-1.8B-2Bit-GGUF">HY-1.8B-2Bit-GGUF</a>&nbsp&nbsp | &nbsp&nbsp📖 <a href="https://angelslim.readthedocs.io/">Documentation</a>&nbsp&nbsp | &nbsp&nbsp🤗 <a href="https://huggingface.co/AngelSlim">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp🤖 <a href="https://modelscope.cn/organization/AngelSlim">ModelScope</a>&nbsp&nbsp | &nbsp&nbsp💬 <a href="./docs/source/assets/angel_slim_wechat.png">WeChat</a>
 <br>
 </p>
 ![image/jpeg](2bit-benchmark.png)
 ## 📣Latest News
 - [26/02/09] We have released HY-1.8B-2Bit, 2bit on-device large language model.
 - [26/01/13] We have released v0.3. We support the training and deployment of Eagle3 for all-scale LLMs/VLMs/Audio models, as detailed in the [guidance documentation](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/eagle/index.html). And We released **Sherry**, the hardware-efficient 1.25 bit quantization algorithm [Paper Comming soon] | [[Code]](https://github.com/Tencent/AngelSlim/tree/sherry/Sherry)🔥🔥🔥
@@ -56,6 +54,7 @@ xxx
 ## 💻Deployment
 ### Running Hunyuan model on MacBook M4
@@ -71,16 +70,10 @@ Enter the llama.cpp folder
 cd llama.cpp
 ```
-Checkout 3cea17ca51da3bb5ca6748c3f781fac8d0ff20fb
-```bash
-git checkout 3cea17ca51da3bb5ca6748c3f781fac8d0ff20fb
-```
-Apply the patch to enable the Int2 KleidiAI optimizations for SME2
 ```bash
-git apply 0001-Add-support-for-int2-per-channel-quantization.patch
 ```
 Build llama.cpp with KleidiAI enabled
@@ -121,7 +114,7 @@ The general command is:
 ./bin/llama-bench -m hunyuan-q2_0.gguf -p <prompt-length> -t <number-of-threads> -n <gen-length>
 ```
-**********【性能图】*******************
 ## 📝 License

 </h3>
 <p align="center">
+          📖 <a href="https://angelslim.readthedocs.io/">Documentation</a>&nbsp&nbsp | &nbsp&nbsp🤗 <a href="https://huggingface.co/AngelSlim">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp🤖 <a href="https://modelscope.cn/organization/AngelSlim">ModelScope</a>&nbsp&nbsp | &nbsp&nbsp💬 <a href="./docs/source/assets/angel_slim_wechat.png">WeChat</a>
 <br>
 </p>
 ![image/jpeg](2bit-benchmark.png)
 ## 📣Latest News
 - [26/02/09] We have released HY-1.8B-2Bit, 2bit on-device large language model.
 - [26/01/13] We have released v0.3. We support the training and deployment of Eagle3 for all-scale LLMs/VLMs/Audio models, as detailed in the [guidance documentation](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/eagle/index.html). And We released **Sherry**, the hardware-efficient 1.25 bit quantization algorithm [Paper Comming soon] | [[Code]](https://github.com/Tencent/AngelSlim/tree/sherry/Sherry)🔥🔥🔥
 ## 💻Deployment
+This setup ONLY works on SME2-capable devices (for example, Apple M4, vivo x300 and Arm CPUs with SME2 support). Neon kernel will follow up.
 ### Running Hunyuan model on MacBook M4
 cd llama.cpp
 ```
+Fetch and check out the PR branch
 ```bash
+git fetch origin pull/19357/head:pr-19357-sme2-int2
+git checkout pr-19357-sme2-int2
 ```
 Build llama.cpp with KleidiAI enabled
 ./bin/llama-bench -m hunyuan-q2_0.gguf -p <prompt-length> -t <number-of-threads> -n <gen-length>
 ```
+![image/jpeg](m4_performance.png)
 ## 📝 License