software-mansion
/

react-native-executorch-lfm2.5-embedding-350m

Model card Files Files and versions

react-native-executorch-lfm2.5-embedding-350m / README.md

nklockiewicz's picture

Update README.md

688034c verified 13 days ago

|

History Blame Contribute Delete

2.27 kB

	---
	license: other
	license_name: lfm1.0
	license_link: https://huggingface.co/LiquidAI/LFM2.5-Embedding-350M/blob/main/LICENSE
	---

	# Introduction

	This repository hosts the [LFM2.5-Embedding-350M](https://huggingface.co/LiquidAI/LFM2.5-Embedding-350M) model for the [React Native ExecuTorch](https://www.npmjs.com/package/react-native-executorch) library. It includes the model exported for both the XNNPACK (Android / generic CPU) and MLX (Apple GPU) delegates, ready for use in the ExecuTorch runtime.

	If you'd like to run these models in your own ExecuTorch runtime, refer to the [official documentation](https://pytorch.org/executorch/stable/index.html) for setup instructions.

	## Compatibility

	If you intend to use this model outside of React Native ExecuTorch, make sure your runtime is compatible with the ExecuTorch version used to export the `.pte` files. If you work with React Native ExecuTorch, the constants from the library will guarantee compatibility with the runtime used behind the scenes.

	The MLX variant requires a physical Apple Silicon device (it does not run on the iOS simulator). The XNNPACK variant runs everywhere.

	## Repository Structure

	- `xnnpack/` — `.pte` file partitioned for the XNNPACK delegate.
	- `mlx/` — `.pte` file partitioned for the MLX delegate (Apple Silicon only).
	- `tokenizer.json` — HuggingFace fast-tokenizer dump. Wire this to `tokenizerSource`.
	- `config.json`, `tokenizer_config.json` — upstream model/tokenizer configs, kept for reference and for non-RNE consumers.

	The `.pte` path goes to `modelSource`; `tokenizer.json` is shared across all variants.

	## Model details

	- Architecture: LFM2.5-350M bidirectional backbone (hybrid conv + attention, hidden size 1024) + CLS pooling + L2 normalize. The exported graph bakes in CLS pooling and L2 normalization, so the runner consumes `(input_ids, attention_mask)` and receives the final unit-norm embedding directly.
	- Output dimension: 1024.
	- Similarity metric: cosine (embeddings are L2-normalized, so a dot product equals cosine).
	- Prompts: the model is trained with asymmetric `query: ` / `document: ` text prefixes. Prepend `query: ` to search queries and `document: ` to indexed passages for best retrieval quality.