LFM2.5-ColBERT-350M โ€” CrispEmbed GGUF

CrispEmbed-native GGUF quantizations of LiquidAI/LFM2.5-ColBERT-350M.

Multi-vector (ColBERT-style) retrieval: per-token embeddings projected to 128 dimensions, L2-normalized. Uses late interaction (MaxSim) scoring for fine-grained token-level matching.

Format note: These GGUFs use CrispEmbed's internal tensor naming (lfm.* prefix, arch=lfm2). They include the colbert.projection.weight tensor from the 1_Dense module. Not compatible with llama.cpp.

Model variants

File Quant Size ColBERT cos vs F32
lfm2-colbert-f32.gguf F32 677 MB 0.999995
lfm2-colbert-q8_0.gguf Q8_0 361 MB 0.998
lfm2-colbert-q5_k.gguf Q5_K 258 MB 0.977
lfm2-colbert-q4_k.gguf Q4_K 224 MB 0.959

Architecture

  • Backbone: LFM2.5-350M bidirectional hybrid (16 layers: 10 ShortConv + 6 GQA attention, 1024-dim hidden, SwiGLU FFN)
  • ColBERT head: Linear(1024, 128) + L2 normalize per token
  • Scoring: MaxSim โ€” max over doc tokens of cosine similarity per query token, summed
  • Parameters: 350M + 128K projection head
  • Languages: EN, ES, DE, FR, IT, PT, AR, SV, NO, JA, KO (11 languages)
  • Task prefixes: "query: " for queries, "document: " for passages

Usage

# ColBERT multi-vector encode
./crispembed -m lfm2-colbert-q8_0.gguf --colbert "query: what is deep learning?"

# JSON output (per-token vectors)
./crispembed -m lfm2-colbert-q8_0.gguf --colbert --json "query: machine learning"

# Server
./crispembed-server --embed lfm2-colbert-q8_0.gguf --port 8080
curl -X POST http://localhost:8080/colbert/score \
  -d '{"query": "what is deep learning?", "documents": ["Deep learning is a subset of ML", "The weather is nice"]}'
from crispembed import CrispVit

model = CrispVit("lfm2-colbert-q8_0.gguf")
assert model.has_colbert

# Encode multi-vector representations
query_vecs = model.encode_multivec("query: what is deep learning?")   # (n_tokens, 128)
doc_vecs = model.encode_multivec("document: Deep learning uses neural networks")

# MaxSim scoring
score = model.maxsim(query_vecs, doc_vecs)
print(f"Score: {score:.4f}")
use crispembed::CrispEmbed;

let mut model = CrispEmbed::new("lfm2-colbert-q8_0.gguf", 4)?;
assert!(model.has_colbert());

let query = model.encode_multivec("query: what is deep learning?");
let doc = model.encode_multivec("document: Neural networks learn representations");

Conversion

Convert from the source model yourself:

git clone https://github.com/CrispStrobe/CrispEmbed
cd CrispEmbed

# Convert (loads 1_Dense/model.safetensors for ColBERT projection)
python models/convert-lfm2-embed-to-gguf.py \
    --model LiquidAI/LFM2.5-ColBERT-350M \
    --output lfm2-colbert-f32.gguf --dtype f32

# Quantize
./build/crispembed-quantize lfm2-colbert-f32.gguf lfm2-colbert-q8_0.gguf q8_0
./build/crispembed-quantize lfm2-colbert-f32.gguf lfm2-colbert-q5_k.gguf q5_k
./build/crispembed-quantize lfm2-colbert-f32.gguf lfm2-colbert-q4_k.gguf q4_k

License

LFM Open License v1.0 โ€” same as the base model.

Credits

Original model by LiquidAI. GGUF conversion and inference engine by CrispEmbed.

Downloads last month
200
GGUF
Model size
0.4B params
Architecture
lfm2
Hardware compatibility
Log In to add your hardware

8-bit

32-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for cstr/lfm2-colbert-GGUF

Quantized
(2)
this model