Update README.md
Browse files
README.md
CHANGED
|
@@ -6,11 +6,11 @@ base_model_relation: quantized
|
|
| 6 |
pipeline_tag: text-generation
|
| 7 |
tags:
|
| 8 |
- chatllm.cpp
|
| 9 |
-
- ggml
|
| 10 |
- quantization
|
| 11 |
- int4
|
| 12 |
- int8
|
| 13 |
- cpu-inference
|
|
|
|
| 14 |
quantized_by: riverkan
|
| 15 |
language:
|
| 16 |
- en
|
|
@@ -27,7 +27,7 @@ language:
|
|
| 27 |
|
| 28 |
Author and distribution: [Riverkan](https://riverkan.com)
|
| 29 |
|
| 30 |
-
This repository provides CPU/GPU-friendly quantized builds of Ling‑Mini‑2.0 for [ChatLLM.cpp](https://github.com/foldl/chatllm.cpp). It is not a LLaMA model, is not affiliated with Meta, and does not use the LLaMA license. Files are distributed in ChatLLM.cpp’s
|
| 31 |
|
| 32 |
- Available quantizations: Q4_0 (int4), Q8_0 (int8)
|
| 33 |
- Tested runtime: ChatLLM.cpp
|
|
@@ -38,7 +38,7 @@ Notes:
|
|
| 38 |
|
| 39 |
## ChatLLM.cpp Quantizations of Ling‑Mini‑2.0
|
| 40 |
|
| 41 |
-
Quantized with the ChatLLM.cpp toolchain for
|
| 42 |
|
| 43 |
Original (float) model: to be announced by Riverkan.
|
| 44 |
|
|
@@ -80,7 +80,7 @@ No special tokens are required by the model itself; most UIs can just send user
|
|
| 80 |
|
| 81 |
Notes:
|
| 82 |
- File sizes depend on the base model size; check the release or hosting page for exact sizes.
|
| 83 |
-
- These are
|
| 84 |
|
| 85 |
## How to use with ChatLLM.cpp
|
| 86 |
|
|
@@ -163,19 +163,19 @@ pip install -U "huggingface_hub[cli]"
|
|
| 163 |
|
| 164 |
Download a specific file:
|
| 165 |
```bash
|
| 166 |
-
huggingface-cli download
|
| 167 |
```
|
| 168 |
|
| 169 |
Or the Q8_0 build:
|
| 170 |
```bash
|
| 171 |
-
huggingface-cli download
|
| 172 |
```
|
| 173 |
|
| 174 |
Replace the model repo path with the actual hosting path if different.
|
| 175 |
|
| 176 |
## Building your own quant (optional)
|
| 177 |
|
| 178 |
-
If you have the float/base weights and want to generate your own
|
| 179 |
|
| 180 |
1) Install Python deps for ChatLLM.cpp’s conversion pipeline:
|
| 181 |
```bash
|
|
@@ -193,13 +193,13 @@ python convert.py -i /path/to/base/model -t q4_0 -o Ling‑Mini‑2.0‑Q4_0.bin
|
|
| 193 |
```
|
| 194 |
|
| 195 |
Notes:
|
| 196 |
-
- ChatLLM.cpp uses
|
| 197 |
- See ChatLLM.cpp docs for model-specific flags and supported architectures.
|
| 198 |
|
| 199 |
## Credits
|
| 200 |
|
| 201 |
- Model and quantized distributions by Riverkan
|
| 202 |
-
- Runtime and tooling: ChatLLM.cpp (thanks to the maintainers and the
|
| 203 |
- Thanks to the InclusionAI team for their foundational work and support!
|
| 204 |
- Everyone in the open-source LLM community who provided benchmarks, ideas, and tools
|
| 205 |
|
|
|
|
| 6 |
pipeline_tag: text-generation
|
| 7 |
tags:
|
| 8 |
- chatllm.cpp
|
|
|
|
| 9 |
- quantization
|
| 10 |
- int4
|
| 11 |
- int8
|
| 12 |
- cpu-inference
|
| 13 |
+
- ggmm
|
| 14 |
quantized_by: riverkan
|
| 15 |
language:
|
| 16 |
- en
|
|
|
|
| 27 |
|
| 28 |
Author and distribution: [Riverkan](https://riverkan.com)
|
| 29 |
|
| 30 |
+
This repository provides CPU/GPU-friendly quantized builds of Ling‑Mini‑2.0 for [ChatLLM.cpp](https://github.com/foldl/chatllm.cpp). It is not a LLaMA model, is not affiliated with Meta, and does not use the LLaMA license. Files are distributed in ChatLLM.cpp’s GGMM-based format (.bin), ready for local inference.
|
| 31 |
|
| 32 |
- Available quantizations: Q4_0 (int4), Q8_0 (int8)
|
| 33 |
- Tested runtime: ChatLLM.cpp
|
|
|
|
| 38 |
|
| 39 |
## ChatLLM.cpp Quantizations of Ling‑Mini‑2.0
|
| 40 |
|
| 41 |
+
Quantized with the ChatLLM.cpp toolchain for GGMM-format inference (.bin). These builds are intended for the ChatLLM.cpp runtime (CPU and optional GPU acceleration as provided by ChatLLM’s GGMM backends). Use ChatLLM.cpp’s convert and run flow described below.
|
| 42 |
|
| 43 |
Original (float) model: to be announced by Riverkan.
|
| 44 |
|
|
|
|
| 80 |
|
| 81 |
Notes:
|
| 82 |
- File sizes depend on the base model size; check the release or hosting page for exact sizes.
|
| 83 |
+
- These are GGMM (.bin) files for ChatLLM.cpp, not GGUF.
|
| 84 |
|
| 85 |
## How to use with ChatLLM.cpp
|
| 86 |
|
|
|
|
| 163 |
|
| 164 |
Download a specific file:
|
| 165 |
```bash
|
| 166 |
+
huggingface-cli download RiverkanIT/Ling-mini-2.0-Quantized --include "Ling‑Mini‑2.0‑Q4_0.bin" --local-dir ./
|
| 167 |
```
|
| 168 |
|
| 169 |
Or the Q8_0 build:
|
| 170 |
```bash
|
| 171 |
+
huggingface-cli download RiverkanIT/Ling-mini-2.0-Quantized --include "Ling‑Mini‑2.0‑Q8_0.bin" --local-dir ./
|
| 172 |
```
|
| 173 |
|
| 174 |
Replace the model repo path with the actual hosting path if different.
|
| 175 |
|
| 176 |
## Building your own quant (optional)
|
| 177 |
|
| 178 |
+
If you have the float/base weights and want to generate your own GGMM quantized file for ChatLLM.cpp:
|
| 179 |
|
| 180 |
1) Install Python deps for ChatLLM.cpp’s conversion pipeline:
|
| 181 |
```bash
|
|
|
|
| 193 |
```
|
| 194 |
|
| 195 |
Notes:
|
| 196 |
+
- ChatLLM.cpp uses GGMM-based .bin files (not GGUF).
|
| 197 |
- See ChatLLM.cpp docs for model-specific flags and supported architectures.
|
| 198 |
|
| 199 |
## Credits
|
| 200 |
|
| 201 |
- Model and quantized distributions by Riverkan
|
| 202 |
+
- Runtime and tooling: ChatLLM.cpp (thanks to the maintainers and the GGMM community)
|
| 203 |
- Thanks to the InclusionAI team for their foundational work and support!
|
| 204 |
- Everyone in the open-source LLM community who provided benchmarks, ideas, and tools
|
| 205 |
|