woodchen7 commited on
Commit
ffd12ec
·
verified ·
1 Parent(s): e77bdd0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -13
README.md CHANGED
@@ -17,14 +17,12 @@ Dedicated to building a more intuitive, comprehensive, and efficient LLMs compre
17
  </h3>
18
 
19
  <p align="center">
20
- 📣 <a href="https://huggingface.co/AngelSlim/HY-1.8B-2Bit-GGUF">HY-1.8B-2Bit-GGUF</a>&nbsp&nbsp | &nbsp&nbsp📖 <a href="https://angelslim.readthedocs.io/">Documentation</a>&nbsp&nbsp | &nbsp&nbsp🤗 <a href="https://huggingface.co/AngelSlim">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp🤖 <a href="https://modelscope.cn/organization/AngelSlim">ModelScope</a>&nbsp&nbsp | &nbsp&nbsp💬 <a href="./docs/source/assets/angel_slim_wechat.png">WeChat</a>
21
  <br>
22
  </p>
23
 
24
  ![image/jpeg](2bit-benchmark.png)
25
 
26
-
27
-
28
  ## 📣Latest News
29
  - [26/02/09] We have released HY-1.8B-2Bit, 2bit on-device large language model.
30
  - [26/01/13] We have released v0.3. We support the training and deployment of Eagle3 for all-scale LLMs/VLMs/Audio models, as detailed in the [guidance documentation](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/eagle/index.html). And We released **Sherry**, the hardware-efficient 1.25 bit quantization algorithm [Paper Comming soon] | [[Code]](https://github.com/Tencent/AngelSlim/tree/sherry/Sherry)🔥🔥🔥
@@ -56,6 +54,7 @@ xxx
56
 
57
 
58
  ## 💻Deployment
 
59
 
60
  ### Running Hunyuan model on MacBook M4
61
 
@@ -71,16 +70,10 @@ Enter the llama.cpp folder
71
  cd llama.cpp
72
  ```
73
 
74
- Checkout 3cea17ca51da3bb5ca6748c3f781fac8d0ff20fb
75
-
76
- ```bash
77
- git checkout 3cea17ca51da3bb5ca6748c3f781fac8d0ff20fb
78
- ```
79
-
80
- Apply the patch to enable the Int2 KleidiAI optimizations for SME2
81
-
82
  ```bash
83
- git apply 0001-Add-support-for-int2-per-channel-quantization.patch
 
84
  ```
85
 
86
  Build llama.cpp with KleidiAI enabled
@@ -121,7 +114,7 @@ The general command is:
121
  ./bin/llama-bench -m hunyuan-q2_0.gguf -p <prompt-length> -t <number-of-threads> -n <gen-length>
122
  ```
123
 
124
- **********【性能图】*******************
125
 
126
 
127
  ## 📝 License
 
17
  </h3>
18
 
19
  <p align="center">
20
+ 📖 <a href="https://angelslim.readthedocs.io/">Documentation</a>&nbsp&nbsp | &nbsp&nbsp🤗 <a href="https://huggingface.co/AngelSlim">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp🤖 <a href="https://modelscope.cn/organization/AngelSlim">ModelScope</a>&nbsp&nbsp | &nbsp&nbsp💬 <a href="./docs/source/assets/angel_slim_wechat.png">WeChat</a>
21
  <br>
22
  </p>
23
 
24
  ![image/jpeg](2bit-benchmark.png)
25
 
 
 
26
  ## 📣Latest News
27
  - [26/02/09] We have released HY-1.8B-2Bit, 2bit on-device large language model.
28
  - [26/01/13] We have released v0.3. We support the training and deployment of Eagle3 for all-scale LLMs/VLMs/Audio models, as detailed in the [guidance documentation](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/eagle/index.html). And We released **Sherry**, the hardware-efficient 1.25 bit quantization algorithm [Paper Comming soon] | [[Code]](https://github.com/Tencent/AngelSlim/tree/sherry/Sherry)🔥🔥🔥
 
54
 
55
 
56
  ## 💻Deployment
57
+ This setup ONLY works on SME2-capable devices (for example, Apple M4, vivo x300 and Arm CPUs with SME2 support). Neon kernel will follow up.
58
 
59
  ### Running Hunyuan model on MacBook M4
60
 
 
70
  cd llama.cpp
71
  ```
72
 
73
+ Fetch and check out the PR branch
 
 
 
 
 
 
 
74
  ```bash
75
+ git fetch origin pull/19357/head:pr-19357-sme2-int2
76
+ git checkout pr-19357-sme2-int2
77
  ```
78
 
79
  Build llama.cpp with KleidiAI enabled
 
114
  ./bin/llama-bench -m hunyuan-q2_0.gguf -p <prompt-length> -t <number-of-threads> -n <gen-length>
115
  ```
116
 
117
+ ![image/jpeg](m4_performance.png)
118
 
119
 
120
  ## 📝 License