---
license: other
license_name: lfm1.0
license_link: https://huggingface.co/LiquidAI/LFM2.5-ColBERT-350M/blob/main/LICENSE
---

# Introduction

This repository hosts the [LFM2.5-ColBERT-350M](https://huggingface.co/LiquidAI/LFM2.5-ColBERT-350M) late-interaction retrieval model for the [React Native ExecuTorch](https://www.npmjs.com/package/react-native-executorch) library, exported for the **XNNPACK** (Android / generic CPU) and **MLX** (Apple GPU) delegates.

Unlike a standard sentence embedder (one vector per text), ColBERT is a
**multi-vector / late-interaction** model: it produces **one vector per token**
(`[numTokens, 128]`). Relevance is computed with **MaxSim** (for each query
token, the max dot product over document tokens, summed). Use it when you want
stronger retrieval quality than single-vector embeddings — e.g. RAG / search.

## Compatibility

The **MLX** variant requires a physical Apple Silicon device (it does not run
on the iOS simulator). The **XNNPACK** variant runs everywhere. Make sure your
runtime matches the ExecuTorch version used to export these `.pte` files; with
React Native ExecuTorch the library constants guarantee this.

### Using it (late interaction)

The model is a per-token embedder; scoring is the consumer's concern:

1. Prepend the role marker the model was trained with: `"[Q] "` for queries,
   `"[D] "` for documents.
2. Run `forward` to get the per-token `[S, 128]` matrix for each text.
3. Score query↔document with **MaxSim**, optionally excluding the document
   **skiplist** token ids (punctuation) so they don't contribute. The skiplist
   for this model (from its `config_sentence_transformers.json`) tokenizes to:
   `[510..524, 535..541, 568..573, 600..603]` (32 ids).

## Repository Structure

- `xnnpack/`, `mlx/` — the partitioned `.pte` files + per-backend `config.json`.
- `tokenizer.json` — wire to `tokenizerSource`.
- `config.json`, `tokenizer_config.json` — reference metadata.