Introduction
This repository hosts the LFM2.5-ColBERT-350M late-interaction retrieval model for the React Native ExecuTorch library, exported for the XNNPACK (Android / generic CPU) and MLX (Apple GPU) delegates.
Unlike a standard sentence embedder (one vector per text), ColBERT is a
multi-vector / late-interaction model: it produces one vector per token
([numTokens, 128]). Relevance is computed with MaxSim (for each query
token, the max dot product over document tokens, summed). Use it when you want
stronger retrieval quality than single-vector embeddings — e.g. RAG / search.
Compatibility
The MLX variant requires a physical Apple Silicon device (it does not run
on the iOS simulator). The XNNPACK variant runs everywhere. Make sure your
runtime matches the ExecuTorch version used to export these .pte files; with
React Native ExecuTorch the library constants guarantee this.
Using it (late interaction)
The model is a per-token embedder; scoring is the consumer's concern:
- Prepend the role marker the model was trained with:
"[Q] "for queries,"[D] "for documents. - Run
forwardto get the per-token[S, 128]matrix for each text. - Score query↔document with MaxSim, optionally excluding the document
skiplist token ids (punctuation) so they don't contribute. The skiplist
for this model (from its
config_sentence_transformers.json) tokenizes to:[510..524, 535..541, 568..573, 600..603](32 ids).
Repository Structure
xnnpack/,mlx/— the partitioned.ptefiles + per-backendconfig.json.tokenizer.json— wire totokenizerSource.config.json,tokenizer_config.json— reference metadata.
- Downloads last month
- 67