ColBERT-Zero (GGUF f32 + Projection)

Quantized GGUF conversion of lightonai/ColBERT-Zero for use with litembeddings.

ColBERT-Zero is SOTA on BEIR (55.43 nDCG@10) for models under 150M parameters, outperforming all other ColBERT and dense retrieval models trained on public data.

Model Details

Property Value
Base Model lightonai/ColBERT-Zero
Architecture ModernBERT-base (~100M params)
Output Dimensions 128 (after projection)
Context Length 8,192 tokens
Quantization f32
GGUF Size 571 MB
Projection 768 โ†’ 128 (PyLate Dense layer)
License Apache 2.0
Use Case General-purpose semantic search with late interaction (ColBERT-style MaxSim)

Available Variants

Variant Size Embedding Latency (11 tok / 50 tok / 150 tok) Notes
f32 571 MB 463ms / 770ms / 3062ms Original precision
f16 286 MB 1385ms / 3642ms / 11439ms Slow without FP16 hardware
Q8_0 (recommended) 153 MB 97ms / 625ms / 2633ms Fastest on CPU, 3.7x smaller than f32

Benchmarked on QEMU vCPU with SSE4.2. Q8_0 is fastest due to integer SIMD; f16 is slowest without hardware FP16.

BEIR Benchmark (from original model)

Model BEIR nDCG@10 Params Data
ColBERT-Zero 55.43 ~100M Public only
ModernColBERT-embed-base 55.12 ~100M Public only
GTE-ModernColBERT 54.67 ~100M Proprietary
ModernBERT-embed-supervised (dense) 52.89 ~100M Public only

MaxSim Score Consistency Across Quants

Query f32 f16 Q8_0
Related pair 9.203 9.202 9.191
Unrelated pair 7.643 7.642 7.626

Negligible quality loss from quantization โ€” Q8_0 scores within 0.1% of f32.

Files

File Size Description
lightonai-colbert-zero-f32.gguf 571 MB ModernBERT-base encoder in GGUF f32 format
lightonai-colbert-zero-f32.projection 385 KB Projection matrix (128ร—768, float32)

Usage with litembeddings

.load ./build/litembeddings

-- Load model with projection
SELECT lembed_model('lightonai-colbert-zero-f32.gguf',
    '{"colbert_projection": "lightonai-colbert-zero-f32.projection"}');

-- Generate token embeddings
SELECT lembed_tokens('search_query: What is machine learning?');

-- Semantic search with MaxSim scoring
SELECT
    id, content,
    lembed_maxsim(lembed_tokens('search_query: error handling best practices'), tokens) AS score
FROM documents
ORDER BY score DESC
LIMIT 10;

Important: Query/Document Prefixes

ColBERT-Zero uses asymmetric prompts for best results:

  • Queries: Prefix with search_query:
  • Documents: Prefix with search_document:

Omitting these prefixes degrades performance by ~0.8-1.3 nDCG@10 points.

Conversion

python scripts/convert_colbert_to_gguf.py lightonai/ColBERT-Zero ./models \
    --name colbert-zero --quantize f32

License: Apache 2.0

Downloads last month
12
GGUF
Model size
0.1B params
Architecture
modern-bert
Hardware compatibility
Log In to add your hardware

32-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for embedme/lightonai-colbert-zero-f32

Quantized
(7)
this model