ctrltokyo commited on
Commit
c2e2fcc
·
verified ·
1 Parent(s): c19b5ca

Add model card for ColBERT-Zero Q8_0

Browse files
Files changed (1) hide show
  1. README.md +110 -0
README.md ADDED
@@ -0,0 +1,110 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: lightonai/ColBERT-Zero
4
+ tags:
5
+ - gguf
6
+ - litembeddings
7
+ - colbert
8
+ - late-interaction
9
+ - modernbert
10
+ - retrieval
11
+ - pylate
12
+ language:
13
+ - en
14
+ pipeline_tag: feature-extraction
15
+ ---
16
+
17
+ # ColBERT-Zero (GGUF Q8_0 + Projection)
18
+
19
+ Quantized GGUF conversion of [lightonai/ColBERT-Zero](https://huggingface.co/lightonai/ColBERT-Zero) for use with [litembeddings](https://github.com/alexandernicholson/litembeddings).
20
+
21
+ **ColBERT-Zero is SOTA on BEIR (55.43 nDCG@10) for models under 150M parameters**, outperforming all other ColBERT and dense retrieval models trained on public data.
22
+
23
+ ## Model Details
24
+
25
+ | Property | Value |
26
+ |----------|-------|
27
+ | **Base Model** | [lightonai/ColBERT-Zero](https://huggingface.co/lightonai/ColBERT-Zero) |
28
+ | **Architecture** | ModernBERT-base (~100M params) |
29
+ | **Output Dimensions** | 128 (after projection) |
30
+ | **Context Length** | 8,192 tokens |
31
+ | **Quantization** | Q8_0 |
32
+ | **GGUF Size** | 153 MB |
33
+ | **Projection** | 768 → 128 (PyLate Dense layer) |
34
+ | **License** | Apache 2.0 |
35
+ | **Use Case** | General-purpose semantic search with late interaction (ColBERT-style MaxSim) |
36
+
37
+ ## Available Variants
38
+
39
+ | Variant | Size | Embedding Latency (11 tok / 50 tok / 150 tok) | Notes |
40
+ |---------|------|------------------------------------------------|-------|
41
+ | [**f32**](https://huggingface.co/embedme/lightonai-colbert-zero-f32) | 571 MB | 463ms / 770ms / 3062ms | Original precision |
42
+ | [**f16**](https://huggingface.co/embedme/lightonai-colbert-zero-f16) | 286 MB | 1385ms / 3642ms / 11439ms | Slow without FP16 hardware |
43
+ | [**Q8_0**](https://huggingface.co/embedme/lightonai-colbert-zero-Q8_0) (recommended) | 153 MB | 97ms / 625ms / 2633ms | **Fastest on CPU**, 3.7x smaller than f32 |
44
+
45
+ > Benchmarked on QEMU vCPU with SSE4.2. Q8_0 is fastest due to integer SIMD; f16 is slowest without hardware FP16.
46
+
47
+ ## BEIR Benchmark (from original model)
48
+
49
+ | Model | BEIR nDCG@10 | Params | Data |
50
+ |-------|-------------|--------|------|
51
+ | **ColBERT-Zero** | **55.43** | ~100M | Public only |
52
+ | ModernColBERT-embed-base | 55.12 | ~100M | Public only |
53
+ | GTE-ModernColBERT | 54.67 | ~100M | Proprietary |
54
+ | ModernBERT-embed-supervised (dense) | 52.89 | ~100M | Public only |
55
+
56
+ ## MaxSim Score Consistency Across Quants
57
+
58
+ | Query | f32 | f16 | Q8_0 |
59
+ |-------|-----|-----|------|
60
+ | Related pair | 9.203 | 9.202 | 9.191 |
61
+ | Unrelated pair | 7.643 | 7.642 | 7.626 |
62
+
63
+ Negligible quality loss from quantization — Q8_0 scores within 0.1% of f32.
64
+
65
+ ## Files
66
+
67
+ | File | Size | Description |
68
+ |------|------|-------------|
69
+ | `lightonai-colbert-zero-Q8_0.gguf` | 153 MB | ModernBERT-base encoder in GGUF Q8_0 format |
70
+ | `lightonai-colbert-zero-Q8_0.projection` | 385 KB | Projection matrix (128×768, float32) |
71
+
72
+ ## Usage with litembeddings
73
+
74
+ ```sql
75
+ .load ./build/litembeddings
76
+
77
+ -- Load model with projection
78
+ SELECT lembed_model('lightonai-colbert-zero-Q8_0.gguf',
79
+ '{"colbert_projection": "lightonai-colbert-zero-Q8_0.projection"}');
80
+
81
+ -- Generate token embeddings
82
+ SELECT lembed_tokens('search_query: What is machine learning?');
83
+
84
+ -- Semantic search with MaxSim scoring
85
+ SELECT
86
+ id, content,
87
+ lembed_maxsim(lembed_tokens('search_query: error handling best practices'), tokens) AS score
88
+ FROM documents
89
+ ORDER BY score DESC
90
+ LIMIT 10;
91
+ ```
92
+
93
+ ### Important: Query/Document Prefixes
94
+
95
+ ColBERT-Zero uses asymmetric prompts for best results:
96
+ - **Queries**: Prefix with `search_query: `
97
+ - **Documents**: Prefix with `search_document: `
98
+
99
+ Omitting these prefixes degrades performance by ~0.8-1.3 nDCG@10 points.
100
+
101
+ ## Conversion
102
+
103
+ ```bash
104
+ python scripts/convert_colbert_to_gguf.py lightonai/ColBERT-Zero ./models \
105
+ --name colbert-zero --quantize q8_0
106
+ ```
107
+
108
+ ---
109
+
110
+ **License:** Apache 2.0