File size: 4,311 Bytes
6b3e3fd
 
 
 
 
 
 
 
 
 
 
 
166784e
 
effde4d
 
 
 
 
 
 
 
 
 
 
6b3e3fd
 
 
166784e
6b3e3fd
166784e
6b3e3fd
166784e
 
 
6b3e3fd
166784e
6b3e3fd
166784e
6b3e3fd
166784e
 
 
 
 
 
 
 
 
6b3e3fd
 
 
166784e
 
 
 
 
 
 
6b3e3fd
166784e
 
 
 
 
 
 
6b3e3fd
166784e
6b3e3fd
166784e
 
 
 
6b3e3fd
166784e
 
 
6b3e3fd
 
 
166784e
 
 
 
 
 
 
 
 
 
6b3e3fd
 
166784e
 
 
 
 
 
 
 
6b3e3fd
166784e
effde4d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6a993ec
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
---
language:
- en
license: apache-2.0
library_name: gguf
tags:
- ruvltra
- sona
- adaptive-learning
- gguf
- quantized
- edge-device
- embedded
- iot
- turboquant
- kv-cache-compression
- flash-attention
- speculative-decoding
- graph-rag
- hybrid-search
- vector-database
- ruvector
- diskann
- mamba-ssm
- colbert
pipeline_tag: text-generation
---

<div align="center">

# RuvLTRA Small

[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![HuggingFace](https://img.shields.io/badge/πŸ€—%20Hugging%20Face-Model-yellow)](https://huggingface.co/ruv/ruvltra-small)
[![GGUF](https://img.shields.io/badge/Format-GGUF-green)](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md)

**πŸ“± Compact Model Optimized for Edge Devices**

[Quick Start](#-quick-start) β€’ [Use Cases](#-use-cases) β€’ [Integration](#-integration)

</div>

---

## Overview

RuvLTRA Small is a compact 0.5B parameter model designed for edge deployment. Perfect for mobile apps, IoT devices, and resource-constrained environments.

## Model Card

| Property | Value |
|----------|-------|
| **Parameters** | 0.5 Billion |
| **Quantization** | Q4_K_M |
| **Context** | 4,096 tokens |
| **Size** | ~398 MB |
| **Min RAM** | 1 GB |

## πŸš€ Quick Start

```bash
# Download
wget https://huggingface.co/ruv/ruvltra-small/resolve/main/ruvltra-0.5b-q4_k_m.gguf

# Run with llama.cpp
./llama-cli -m ruvltra-0.5b-q4_k_m.gguf -p "Hello, I am" -n 64
```

## πŸ’‘ Use Cases

- **Mobile Apps**: On-device AI assistant
- **IoT**: Smart home device intelligence
- **Edge Computing**: Local inference without cloud
- **Prototyping**: Quick model experimentation

## πŸ”§ Integration

### Rust (RuvLLM)
```rust
use ruvllm::hub::ModelDownloader;

let path = ModelDownloader::new()
    .download("ruv/ruvltra-small", None)
    .await?;
```

### Python
```python
from huggingface_hub import hf_hub_download

model = hf_hub_download("ruv/ruvltra-small", "ruvltra-0.5b-q4_k_m.gguf")
```

## Hardware Support

- βœ… Apple Silicon (M1/M2/M3)
- βœ… NVIDIA CUDA
- βœ… CPU (x86/ARM)
- βœ… Raspberry Pi 4/5

---

**License**: Apache 2.0 | **GitHub**: [ruvnet/ruvector](https://github.com/ruvnet/ruvector)


---

## ⚑ TurboQuant KV-Cache Compression

RuvLTRA models are fully compatible with **TurboQuant** β€” 2-4 bit KV-cache quantization that reduces inference memory by 6-8x with <0.5% quality loss.

| Quantization | Compression | Quality Loss | Best For |
|-------------|-------------|--------------|----------|
| 3-bit | 10.7x | <1% | **Recommended** β€” best balance |
| 4-bit | 8x | <0.5% | High quality, long context |
| 2-bit | 32x | ~2% | Edge devices, max savings |

### Usage with RuvLLM

```bash
cargo add ruvllm    # Rust
npm install @ruvector/ruvllm   # Node.js
```

```rust
use ruvllm::quantize::turbo_quant::{TurboQuantCompressor, TurboQuantConfig, TurboQuantBits};

let config = TurboQuantConfig {
    bits: TurboQuantBits::Bit3_5, // 10.7x compression
    use_qjl: true,
    ..Default::default()
};
let compressor = TurboQuantCompressor::new(config)?;
let compressed = compressor.compress_batch(&kv_vectors)?;
let scores = compressor.inner_product_batch_optimized(&query, &compressed)?;
```

### v2.1.0 Ecosystem

- **Hybrid Search** β€” Sparse + dense vectors with RRF fusion (20-49% better retrieval)
- **Graph RAG** β€” Knowledge graph + community detection for multi-hop queries
- **DiskANN** β€” Billion-scale SSD-backed ANN with <10ms latency
- **FlashAttention-3** β€” IO-aware tiled attention, O(N) memory
- **MLA** β€” Multi-Head Latent Attention (~93% KV-cache compression)
- **Mamba SSM** β€” Linear-time selective state space models
- **Speculative Decoding** β€” 2-3x generation speedup

[RuVector GitHub](https://github.com/ruvnet/ruvector) | [ruvllm crate](https://crates.io/crates/ruvllm) | [@ruvector/ruvllm npm](https://www.npmjs.com/package/@ruvector/ruvllm)


---

## Benchmarks (L4 GPU, 24GB VRAM)

| Metric | Result |
|--------|--------|
| **Inference Speed** | 75.4 tok/s |
| **Model Load Time** | 1.44s |
| **Parameters** | 0.5B |
| **TurboQuant KV (3-bit)** | 10.7x compression, <1% PPL loss |
| **TurboQuant KV (4-bit)** | 8x compression, <0.5% PPL loss |

*Benchmarked on Google Cloud L4 GPU via `ruvltra-calibration` Cloud Run Job (2026-03-28)*