ColBERT-Zero (GGUF f16 + Projection)

Quantized GGUF conversion of lightonai/ColBERT-Zero for use with litembeddings.

ColBERT-Zero is SOTA on BEIR (55.43 nDCG@10) for models under 150M parameters, outperforming all other ColBERT and dense retrieval models trained on public data.

Model Details

Property Value
Base Model lightonai/ColBERT-Zero
Architecture ModernBERT-base (~100M params)
Output Dimensions 128 (after projection)
Context Length 8,192 tokens
Quantization f16
GGUF Size 286 MB
Projection 768 โ†’ 128 (PyLate Dense layer)
License Apache 2.0
Use Case General-purpose semantic search with late interaction (ColBERT-style MaxSim)

Available Variants

Variant Size Embedding Latency (11 tok / 50 tok / 150 tok) Notes
f32 571 MB 463ms / 770ms / 3062ms Original precision
f16 286 MB 1385ms / 3642ms / 11439ms Slow without FP16 hardware
Q8_0 (recommended) 153 MB 97ms / 625ms / 2633ms Fastest on CPU, 3.7x smaller than f32

Benchmarked on QEMU vCPU with SSE4.2. Q8_0 is fastest due to integer SIMD; f16 is slowest without hardware FP16.

BEIR Benchmark (from original model)

Model BEIR nDCG@10 Params Data
ColBERT-Zero 55.43 ~100M Public only
ModernColBERT-embed-base 55.12 ~100M Public only
GTE-ModernColBERT 54.67 ~100M Proprietary
ModernBERT-embed-supervised (dense) 52.89 ~100M Public only

MaxSim Score Consistency Across Quants

Query f32 f16 Q8_0
Related pair 9.203 9.202 9.191
Unrelated pair 7.643 7.642 7.626

Negligible quality loss from quantization โ€” Q8_0 scores within 0.1% of f32.

Files

File Size Description
lightonai-colbert-zero-f16.gguf 286 MB ModernBERT-base encoder in GGUF f16 format
lightonai-colbert-zero-f16.projection 385 KB Projection matrix (128ร—768, float32)

Usage with litembeddings

.load ./build/litembeddings

-- Load model with projection
SELECT lembed_model('lightonai-colbert-zero-f16.gguf',
    '{"colbert_projection": "lightonai-colbert-zero-f16.projection"}');

-- Generate token embeddings
SELECT lembed_tokens('search_query: What is machine learning?');

-- Semantic search with MaxSim scoring
SELECT
    id, content,
    lembed_maxsim(lembed_tokens('search_query: error handling best practices'), tokens) AS score
FROM documents
ORDER BY score DESC
LIMIT 10;

Important: Query/Document Prefixes

ColBERT-Zero uses asymmetric prompts for best results:

  • Queries: Prefix with search_query:
  • Documents: Prefix with search_document:

Omitting these prefixes degrades performance by ~0.8-1.3 nDCG@10 points.

Conversion

python scripts/convert_colbert_to_gguf.py lightonai/ColBERT-Zero ./models \
    --name colbert-zero --quantize f16

License: Apache 2.0

Downloads last month
11
GGUF
Model size
0.1B params
Architecture
modern-bert
Hardware compatibility
Log In to add your hardware

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for embedme/lightonai-colbert-zero-f16

Quantized
(7)
this model