RiverkanIT
/

Ling-mini-2.0-Quantized

@@ -6,11 +6,11 @@ base_model_relation: quantized
 pipeline_tag: text-generation
 tags:
 - chatllm.cpp
-- ggml
 - quantization
 - int4
 - int8
 - cpu-inference
 quantized_by: riverkan
 language:
 - en
@@ -27,7 +27,7 @@ language:
 Author and distribution: [Riverkan](https://riverkan.com)
-This repository provides CPU/GPU-friendly quantized builds of Ling‑Mini‑2.0 for [ChatLLM.cpp](https://github.com/foldl/chatllm.cpp). It is not a LLaMA model, is not affiliated with Meta, and does not use the LLaMA license. Files are distributed in ChatLLM.cpp’s GGML-based format (.bin), ready for local inference.
 - Available quantizations: Q4_0 (int4), Q8_0 (int8)
 - Tested runtime: ChatLLM.cpp
@@ -38,7 +38,7 @@ Notes:
 ## ChatLLM.cpp Quantizations of Ling‑Mini‑2.0
-Quantized with the ChatLLM.cpp toolchain for GGML-format inference (.bin). These builds are intended for the ChatLLM.cpp runtime (CPU and optional GPU acceleration as provided by ChatLLM’s GGML backends). Use ChatLLM.cpp’s convert and run flow described below.
 Original (float) model: to be announced by Riverkan.
@@ -80,7 +80,7 @@ No special tokens are required by the model itself; most UIs can just send user
 Notes:
 - File sizes depend on the base model size; check the release or hosting page for exact sizes.
-- These are GGML (.bin) files for ChatLLM.cpp, not GGUF.
 ## How to use with ChatLLM.cpp
@@ -163,19 +163,19 @@ pip install -U "huggingface_hub[cli]"
 Download a specific file:
 ```bash
-huggingface-cli download riverkan/ling-mini-2.0-GGML --include "Ling‑Mini‑2.0‑Q4_0.bin" --local-dir ./
 ```
 Or the Q8_0 build:
 ```bash
-huggingface-cli download riverkan/ling-mini-2.0-GGML --include "Ling‑Mini‑2.0‑Q8_0.bin" --local-dir ./
 ```
 Replace the model repo path with the actual hosting path if different.
 ## Building your own quant (optional)
-If you have the float/base weights and want to generate your own GGML quantized file for ChatLLM.cpp:
 1) Install Python deps for ChatLLM.cpp’s conversion pipeline:
 ```bash
@@ -193,13 +193,13 @@ python convert.py -i /path/to/base/model -t q4_0 -o Ling‑Mini‑2.0‑Q4_0.bin
 ```
 Notes:
-- ChatLLM.cpp uses GGML-based .bin files (not GGUF).
 - See ChatLLM.cpp docs for model-specific flags and supported architectures.
 ## Credits
 - Model and quantized distributions by Riverkan
-- Runtime and tooling: ChatLLM.cpp (thanks to the maintainers and the GGML community)
 - Thanks to the InclusionAI team for their foundational work and support!
 - Everyone in the open-source LLM community who provided benchmarks, ideas, and tools

 pipeline_tag: text-generation
 tags:
 - chatllm.cpp
 - quantization
 - int4
 - int8
 - cpu-inference
+- ggmm
 quantized_by: riverkan
 language:
 - en
 Author and distribution: [Riverkan](https://riverkan.com)
+This repository provides CPU/GPU-friendly quantized builds of Ling‑Mini‑2.0 for [ChatLLM.cpp](https://github.com/foldl/chatllm.cpp). It is not a LLaMA model, is not affiliated with Meta, and does not use the LLaMA license. Files are distributed in ChatLLM.cpp’s GGMM-based format (.bin), ready for local inference.
 - Available quantizations: Q4_0 (int4), Q8_0 (int8)
 - Tested runtime: ChatLLM.cpp
 ## ChatLLM.cpp Quantizations of Ling‑Mini‑2.0
+Quantized with the ChatLLM.cpp toolchain for GGMM-format inference (.bin). These builds are intended for the ChatLLM.cpp runtime (CPU and optional GPU acceleration as provided by ChatLLM’s GGMM backends). Use ChatLLM.cpp’s convert and run flow described below.
 Original (float) model: to be announced by Riverkan.
 Notes:
 - File sizes depend on the base model size; check the release or hosting page for exact sizes.
+- These are GGMM (.bin) files for ChatLLM.cpp, not GGUF.
 ## How to use with ChatLLM.cpp
 Download a specific file:
 ```bash
+huggingface-cli download RiverkanIT/Ling-mini-2.0-Quantized --include "Ling‑Mini‑2.0‑Q4_0.bin" --local-dir ./
 ```
 Or the Q8_0 build:
 ```bash
+huggingface-cli download RiverkanIT/Ling-mini-2.0-Quantized --include "Ling‑Mini‑2.0‑Q8_0.bin" --local-dir ./
 ```
 Replace the model repo path with the actual hosting path if different.
 ## Building your own quant (optional)
+If you have the float/base weights and want to generate your own GGMM quantized file for ChatLLM.cpp:
 1) Install Python deps for ChatLLM.cpp’s conversion pipeline:
 ```bash
 ```
 Notes:
+- ChatLLM.cpp uses GGMM-based .bin files (not GGUF).
 - See ChatLLM.cpp docs for model-specific flags and supported architectures.
 ## Credits
 - Model and quantized distributions by Riverkan
+- Runtime and tooling: ChatLLM.cpp (thanks to the maintainers and the GGMM community)
 - Thanks to the InclusionAI team for their foundational work and support!
 - Everyone in the open-source LLM community who provided benchmarks, ideas, and tools