embedme
/

lightonai-colbert-zero-Q8_0

+---
+license: apache-2.0
+base_model: lightonai/ColBERT-Zero
+tags:
+  - gguf
+  - litembeddings
+  - colbert
+  - late-interaction
+  - modernbert
+  - retrieval
+  - pylate
+language:
+  - en
+pipeline_tag: feature-extraction
+---
+# ColBERT-Zero (GGUF Q8_0 + Projection)
+Quantized GGUF conversion of [lightonai/ColBERT-Zero](https://huggingface.co/lightonai/ColBERT-Zero) for use with [litembeddings](https://github.com/alexandernicholson/litembeddings).
+**ColBERT-Zero is SOTA on BEIR (55.43 nDCG@10) for models under 150M parameters**, outperforming all other ColBERT and dense retrieval models trained on public data.
+## Model Details
+| Property | Value |
+|----------|-------|
+| **Base Model** | [lightonai/ColBERT-Zero](https://huggingface.co/lightonai/ColBERT-Zero) |
+| **Architecture** | ModernBERT-base (~100M params) |
+| **Output Dimensions** | 128 (after projection) |
+| **Context Length** | 8,192 tokens |
+| **Quantization** | Q8_0 |
+| **GGUF Size** | 153 MB |
+| **Projection** | 768 → 128 (PyLate Dense layer) |
+| **License** | Apache 2.0 |
+| **Use Case** | General-purpose semantic search with late interaction (ColBERT-style MaxSim) |
+## Available Variants
+| Variant | Size | Embedding Latency (11 tok / 50 tok / 150 tok) | Notes |
+|---------|------|------------------------------------------------|-------|
+| [**f32**](https://huggingface.co/embedme/lightonai-colbert-zero-f32) | 571 MB | 463ms / 770ms / 3062ms | Original precision |
+| [**f16**](https://huggingface.co/embedme/lightonai-colbert-zero-f16) | 286 MB | 1385ms / 3642ms / 11439ms | Slow without FP16 hardware |
+| [**Q8_0**](https://huggingface.co/embedme/lightonai-colbert-zero-Q8_0) (recommended) | 153 MB | 97ms / 625ms / 2633ms | **Fastest on CPU**, 3.7x smaller than f32 |
+> Benchmarked on QEMU vCPU with SSE4.2. Q8_0 is fastest due to integer SIMD; f16 is slowest without hardware FP16.
+## BEIR Benchmark (from original model)
+| Model | BEIR nDCG@10 | Params | Data |
+|-------|-------------|--------|------|
+| **ColBERT-Zero** | **55.43** | ~100M | Public only |
+| ModernColBERT-embed-base | 55.12 | ~100M | Public only |
+| GTE-ModernColBERT | 54.67 | ~100M | Proprietary |
+| ModernBERT-embed-supervised (dense) | 52.89 | ~100M | Public only |
+## MaxSim Score Consistency Across Quants
+| Query | f32 | f16 | Q8_0 |
+|-------|-----|-----|------|
+| Related pair | 9.203 | 9.202 | 9.191 |
+| Unrelated pair | 7.643 | 7.642 | 7.626 |
+Negligible quality loss from quantization — Q8_0 scores within 0.1% of f32.
+## Files
+| File | Size | Description |
+|------|------|-------------|
+| `lightonai-colbert-zero-Q8_0.gguf` | 153 MB | ModernBERT-base encoder in GGUF Q8_0 format |
+| `lightonai-colbert-zero-Q8_0.projection` | 385 KB | Projection matrix (128×768, float32) |
+## Usage with litembeddings
+```sql
+.load ./build/litembeddings
+-- Load model with projection
+SELECT lembed_model('lightonai-colbert-zero-Q8_0.gguf',
+    '{"colbert_projection": "lightonai-colbert-zero-Q8_0.projection"}');
+-- Generate token embeddings
+SELECT lembed_tokens('search_query: What is machine learning?');
+-- Semantic search with MaxSim scoring
+SELECT
+    id, content,
+    lembed_maxsim(lembed_tokens('search_query: error handling best practices'), tokens) AS score
+FROM documents
+ORDER BY score DESC
+LIMIT 10;
+```
+### Important: Query/Document Prefixes
+ColBERT-Zero uses asymmetric prompts for best results:
+- **Queries**: Prefix with `search_query: `
+- **Documents**: Prefix with `search_document: `
+Omitting these prefixes degrades performance by ~0.8-1.3 nDCG@10 points.
+## Conversion
+```bash
+python scripts/convert_colbert_to_gguf.py lightonai/ColBERT-Zero ./models \
+    --name colbert-zero --quantize q8_0
+```
+---
+**License:** Apache 2.0