--- license: other license_name: lfm1.0 license_link: https://huggingface.co/LiquidAI/LFM2.5-ColBERT-350M/blob/main/LICENSE --- # Introduction This repository hosts the [LFM2.5-ColBERT-350M](https://huggingface.co/LiquidAI/LFM2.5-ColBERT-350M) late-interaction retrieval model for the [React Native ExecuTorch](https://www.npmjs.com/package/react-native-executorch) library, exported for the **XNNPACK** (Android / generic CPU) and **MLX** (Apple GPU) delegates. Unlike a standard sentence embedder (one vector per text), ColBERT is a **multi-vector / late-interaction** model: it produces **one vector per token** (`[numTokens, 128]`). Relevance is computed with **MaxSim** (for each query token, the max dot product over document tokens, summed). Use it when you want stronger retrieval quality than single-vector embeddings — e.g. RAG / search. ## Compatibility The **MLX** variant requires a physical Apple Silicon device (it does not run on the iOS simulator). The **XNNPACK** variant runs everywhere. Make sure your runtime matches the ExecuTorch version used to export these `.pte` files; with React Native ExecuTorch the library constants guarantee this. ### Using it (late interaction) The model is a per-token embedder; scoring is the consumer's concern: 1. Prepend the role marker the model was trained with: `"[Q] "` for queries, `"[D] "` for documents. 2. Run `forward` to get the per-token `[S, 128]` matrix for each text. 3. Score query↔document with **MaxSim**, optionally excluding the document **skiplist** token ids (punctuation) so they don't contribute. The skiplist for this model (from its `config_sentence_transformers.json`) tokenizes to: `[510..524, 535..541, 568..573, 600..603]` (32 ids). ## Repository Structure - `xnnpack/`, `mlx/` — the partitioned `.pte` files + per-backend `config.json`. - `tokenizer.json` — wire to `tokenizerSource`. - `config.json`, `tokenizer_config.json` — reference metadata.