ColBERT-Zero (GGUF f16 + Projection)
Quantized GGUF conversion of lightonai/ColBERT-Zero for use with litembeddings.
ColBERT-Zero is SOTA on BEIR (55.43 nDCG@10) for models under 150M parameters, outperforming all other ColBERT and dense retrieval models trained on public data.
Model Details
| Property | Value |
|---|---|
| Base Model | lightonai/ColBERT-Zero |
| Architecture | ModernBERT-base (~100M params) |
| Output Dimensions | 128 (after projection) |
| Context Length | 8,192 tokens |
| Quantization | f16 |
| GGUF Size | 286 MB |
| Projection | 768 โ 128 (PyLate Dense layer) |
| License | Apache 2.0 |
| Use Case | General-purpose semantic search with late interaction (ColBERT-style MaxSim) |
Available Variants
| Variant | Size | Embedding Latency (11 tok / 50 tok / 150 tok) | Notes |
|---|---|---|---|
| f32 | 571 MB | 463ms / 770ms / 3062ms | Original precision |
| f16 | 286 MB | 1385ms / 3642ms / 11439ms | Slow without FP16 hardware |
| Q8_0 (recommended) | 153 MB | 97ms / 625ms / 2633ms | Fastest on CPU, 3.7x smaller than f32 |
Benchmarked on QEMU vCPU with SSE4.2. Q8_0 is fastest due to integer SIMD; f16 is slowest without hardware FP16.
BEIR Benchmark (from original model)
| Model | BEIR nDCG@10 | Params | Data |
|---|---|---|---|
| ColBERT-Zero | 55.43 | ~100M | Public only |
| ModernColBERT-embed-base | 55.12 | ~100M | Public only |
| GTE-ModernColBERT | 54.67 | ~100M | Proprietary |
| ModernBERT-embed-supervised (dense) | 52.89 | ~100M | Public only |
MaxSim Score Consistency Across Quants
| Query | f32 | f16 | Q8_0 |
|---|---|---|---|
| Related pair | 9.203 | 9.202 | 9.191 |
| Unrelated pair | 7.643 | 7.642 | 7.626 |
Negligible quality loss from quantization โ Q8_0 scores within 0.1% of f32.
Files
| File | Size | Description |
|---|---|---|
lightonai-colbert-zero-f16.gguf |
286 MB | ModernBERT-base encoder in GGUF f16 format |
lightonai-colbert-zero-f16.projection |
385 KB | Projection matrix (128ร768, float32) |
Usage with litembeddings
.load ./build/litembeddings
-- Load model with projection
SELECT lembed_model('lightonai-colbert-zero-f16.gguf',
'{"colbert_projection": "lightonai-colbert-zero-f16.projection"}');
-- Generate token embeddings
SELECT lembed_tokens('search_query: What is machine learning?');
-- Semantic search with MaxSim scoring
SELECT
id, content,
lembed_maxsim(lembed_tokens('search_query: error handling best practices'), tokens) AS score
FROM documents
ORDER BY score DESC
LIMIT 10;
Important: Query/Document Prefixes
ColBERT-Zero uses asymmetric prompts for best results:
- Queries: Prefix with
search_query: - Documents: Prefix with
search_document:
Omitting these prefixes degrades performance by ~0.8-1.3 nDCG@10 points.
Conversion
python scripts/convert_colbert_to_gguf.py lightonai/ColBERT-Zero ./models \
--name colbert-zero --quantize f16
License: Apache 2.0
- Downloads last month
- 11
Hardware compatibility
Log In to add your hardware
16-bit
Model tree for embedme/lightonai-colbert-zero-f16
Base model
lightonai/ColBERT-Zero