ruv commited on
Commit
c861fb5
·
verified ·
1 Parent(s): 18249d2

Add TurboQuant compatibility, v2.1.0 ecosystem tags

Browse files
Files changed (1) hide show
  1. README.md +56 -0
README.md CHANGED
@@ -9,6 +9,17 @@ tags:
9
  - adaptive-learning
10
  - gguf
11
  - quantized
 
 
 
 
 
 
 
 
 
 
 
12
  pipeline_tag: text-generation
13
  ---
14
 
@@ -99,3 +110,48 @@ python -m llama_cpp.server \
99
  ---
100
 
101
  **License**: Apache 2.0 | **GitHub**: [ruvnet/ruvector](https://github.com/ruvnet/ruvector)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  - adaptive-learning
10
  - gguf
11
  - quantized
12
+ - turboquant
13
+ - kv-cache-compression
14
+ - flash-attention
15
+ - speculative-decoding
16
+ - graph-rag
17
+ - hybrid-search
18
+ - vector-database
19
+ - ruvector
20
+ - diskann
21
+ - mamba-ssm
22
+ - colbert
23
  pipeline_tag: text-generation
24
  ---
25
 
 
110
  ---
111
 
112
  **License**: Apache 2.0 | **GitHub**: [ruvnet/ruvector](https://github.com/ruvnet/ruvector)
113
+
114
+
115
+ ---
116
+
117
+ ## âš¡ TurboQuant KV-Cache Compression
118
+
119
+ RuvLTRA models are fully compatible with **TurboQuant** — 2-4 bit KV-cache quantization that reduces inference memory by 6-8x with <0.5% quality loss.
120
+
121
+ | Quantization | Compression | Quality Loss | Best For |
122
+ |-------------|-------------|--------------|----------|
123
+ | 3-bit | 10.7x | <1% | **Recommended** — best balance |
124
+ | 4-bit | 8x | <0.5% | High quality, long context |
125
+ | 2-bit | 32x | ~2% | Edge devices, max savings |
126
+
127
+ ### Usage with RuvLLM
128
+
129
+ ```bash
130
+ cargo add ruvllm # Rust
131
+ npm install @ruvector/ruvllm # Node.js
132
+ ```
133
+
134
+ ```rust
135
+ use ruvllm::quantize::turbo_quant::{TurboQuantCompressor, TurboQuantConfig, TurboQuantBits};
136
+
137
+ let config = TurboQuantConfig {
138
+ bits: TurboQuantBits::Bit3_5, // 10.7x compression
139
+ use_qjl: true,
140
+ ..Default::default()
141
+ };
142
+ let compressor = TurboQuantCompressor::new(config)?;
143
+ let compressed = compressor.compress_batch(&kv_vectors)?;
144
+ let scores = compressor.inner_product_batch_optimized(&query, &compressed)?;
145
+ ```
146
+
147
+ ### v2.1.0 Ecosystem
148
+
149
+ - **Hybrid Search** — Sparse + dense vectors with RRF fusion (20-49% better retrieval)
150
+ - **Graph RAG** — Knowledge graph + community detection for multi-hop queries
151
+ - **DiskANN** — Billion-scale SSD-backed ANN with <10ms latency
152
+ - **FlashAttention-3** — IO-aware tiled attention, O(N) memory
153
+ - **MLA** — Multi-Head Latent Attention (~93% KV-cache compression)
154
+ - **Mamba SSM** — Linear-time selective state space models
155
+ - **Speculative Decoding** — 2-3x generation speedup
156
+
157
+ [RuVector GitHub](https://github.com/ruvnet/ruvector) | [ruvllm crate](https://crates.io/crates/ruvllm) | [@ruvector/ruvllm npm](https://www.npmjs.com/package/@ruvector/ruvllm)