Update README.md
Browse files
README.md
CHANGED
|
@@ -17,14 +17,12 @@ Dedicated to building a more intuitive, comprehensive, and efficient LLMs compre
|
|
| 17 |
</h3>
|
| 18 |
|
| 19 |
<p align="center">
|
| 20 |
-
|
| 21 |
<br>
|
| 22 |
</p>
|
| 23 |
|
| 24 |

|
| 25 |
|
| 26 |
-
|
| 27 |
-
|
| 28 |
## 📣Latest News
|
| 29 |
- [26/02/09] We have released HY-1.8B-2Bit, 2bit on-device large language model.
|
| 30 |
- [26/01/13] We have released v0.3. We support the training and deployment of Eagle3 for all-scale LLMs/VLMs/Audio models, as detailed in the [guidance documentation](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/eagle/index.html). And We released **Sherry**, the hardware-efficient 1.25 bit quantization algorithm [Paper Comming soon] | [[Code]](https://github.com/Tencent/AngelSlim/tree/sherry/Sherry)🔥🔥🔥
|
|
@@ -56,6 +54,7 @@ xxx
|
|
| 56 |
|
| 57 |
|
| 58 |
## 💻Deployment
|
|
|
|
| 59 |
|
| 60 |
### Running Hunyuan model on MacBook M4
|
| 61 |
|
|
@@ -71,16 +70,10 @@ Enter the llama.cpp folder
|
|
| 71 |
cd llama.cpp
|
| 72 |
```
|
| 73 |
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
```bash
|
| 77 |
-
git checkout 3cea17ca51da3bb5ca6748c3f781fac8d0ff20fb
|
| 78 |
-
```
|
| 79 |
-
|
| 80 |
-
Apply the patch to enable the Int2 KleidiAI optimizations for SME2
|
| 81 |
-
|
| 82 |
```bash
|
| 83 |
-
git
|
|
|
|
| 84 |
```
|
| 85 |
|
| 86 |
Build llama.cpp with KleidiAI enabled
|
|
@@ -121,7 +114,7 @@ The general command is:
|
|
| 121 |
./bin/llama-bench -m hunyuan-q2_0.gguf -p <prompt-length> -t <number-of-threads> -n <gen-length>
|
| 122 |
```
|
| 123 |
|
| 124 |
-
|
| 125 |
|
| 126 |
|
| 127 |
## 📝 License
|
|
|
|
| 17 |
</h3>
|
| 18 |
|
| 19 |
<p align="center">
|
| 20 |
+
📖 <a href="https://angelslim.readthedocs.io/">Documentation</a>   |   🤗 <a href="https://huggingface.co/AngelSlim">Hugging Face</a>   |   🤖 <a href="https://modelscope.cn/organization/AngelSlim">ModelScope</a>   |   💬 <a href="./docs/source/assets/angel_slim_wechat.png">WeChat</a>
|
| 21 |
<br>
|
| 22 |
</p>
|
| 23 |
|
| 24 |

|
| 25 |
|
|
|
|
|
|
|
| 26 |
## 📣Latest News
|
| 27 |
- [26/02/09] We have released HY-1.8B-2Bit, 2bit on-device large language model.
|
| 28 |
- [26/01/13] We have released v0.3. We support the training and deployment of Eagle3 for all-scale LLMs/VLMs/Audio models, as detailed in the [guidance documentation](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/eagle/index.html). And We released **Sherry**, the hardware-efficient 1.25 bit quantization algorithm [Paper Comming soon] | [[Code]](https://github.com/Tencent/AngelSlim/tree/sherry/Sherry)🔥🔥🔥
|
|
|
|
| 54 |
|
| 55 |
|
| 56 |
## 💻Deployment
|
| 57 |
+
This setup ONLY works on SME2-capable devices (for example, Apple M4, vivo x300 and Arm CPUs with SME2 support). Neon kernel will follow up.
|
| 58 |
|
| 59 |
### Running Hunyuan model on MacBook M4
|
| 60 |
|
|
|
|
| 70 |
cd llama.cpp
|
| 71 |
```
|
| 72 |
|
| 73 |
+
Fetch and check out the PR branch
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 74 |
```bash
|
| 75 |
+
git fetch origin pull/19357/head:pr-19357-sme2-int2
|
| 76 |
+
git checkout pr-19357-sme2-int2
|
| 77 |
```
|
| 78 |
|
| 79 |
Build llama.cpp with KleidiAI enabled
|
|
|
|
| 114 |
./bin/llama-bench -m hunyuan-q2_0.gguf -p <prompt-length> -t <number-of-threads> -n <gen-length>
|
| 115 |
```
|
| 116 |
|
| 117 |
+

|
| 118 |
|
| 119 |
|
| 120 |
## 📝 License
|