.load ./build/litembeddings
-- Load model with projectionSELECT lembed_model('lightonai-lateon-code-edge-Q8_0.gguf',
'{"colbert_projection": "lightonai-lateon-code-edge-Q8_0.projection"}');
-- Generate token embeddings for codeSELECT lembed_tokens('async fn get_connection(pool: &Pool) -> Result<Connection>');
-- Code search with MaxSimSELECT
id, code,
lembed_maxsim(lembed_tokens('database connection pool'), token_emb) AS score
FROM code_embeddings
ORDERBY score DESC
LIMIT 10;
Quantization Quality Benchmark
Tested across 3 codebases (jq/C, Rails/Ruby, FastAPI/Python) with 150 questions total (15 easy + 20 medium + 15 hard per codebase). Weighted scoring: easy×1, medium×2, hard×3 = 100 points per codebase, 300 total.
Aggregate Weighted Scores
Variant
Weighted Score
Percentage
f32
240 / 300
80.0%
f16
240 / 300
80.0%
Q8_0
237 / 300
79.0%
Per-Corpus Scores
Corpus
f32
f16
Q8_0
jq (C)
66/100
66/100
63/100
Rails (Ruby)
79/100
79/100
79/100
FastAPI (Python)
95/100
95/100
95/100
Quantization Quality (Top-1 Agreement vs f32)
Corpus
f16
Q8_0
jq
100.0%
96.0%
Rails
100.0%
100.0%
FastAPI
100.0%
98.0%
Key Findings
f16 is lossless — identical weighted score (240/300) and 100% top-1 agreement across all codebases
Q8_0 loses only 1% — 237/300 vs 240/300, drops only on hard queries in jq corpus
Q8_0 is fastest — 2.5s avg query vs 3.4s f32 vs 13.4s f16 (CPU without FP16 hardware)
Easy/medium questions show zero quality difference between all variants
Conversion
Converted using litembeddings' ColBERT converter with PyLate projection support: