Bombek1
/

multilingual-e5-small-litert

Feature Extraction

sentence-transformers

Model card Files Files and versions

multilingual-e5-small-litert / README.md

Bombek1's picture

Upload README.md with huggingface_hub

2959a75 verified 6 days ago

|

history blame contribute delete

3.51 kB

	---
	tags:
	- sentence-transformers
	- embeddings
	- litert
	- tflite
	- edge
	- on-device
	license: mit
	base_model: intfloat/multilingual-e5-small
	pipeline_tag: feature-extraction
	---

	# multilingual-e5-small - LiteRT

	This is a [LiteRT](https://ai.google.dev/edge/litert) (formerly TensorFlow Lite) conversion of [intfloat/multilingual-e5-small](https://huggingface.co/intfloat/multilingual-e5-small) for efficient on-device inference.

	## Model Details

	\| Property \| Value \|
	\|----------\|-------\|
	\| Original Model \| [intfloat/multilingual-e5-small](https://huggingface.co/intfloat/multilingual-e5-small) \|
	\| Format \| LiteRT (.tflite) \|
	\| File Size \| 449.0 MB \|
	\| Task \| Multilingual Sentence Embeddings (100 languages) \|
	\| Max Sequence Length \| 512 \|
	\| Output Dimension \| 384 \|
	\| Pooling Mode \| Mean Pooling \|

	## Performance

	Benchmarked on AMD CPU (WSL2):

	\| Metric \| Value \|
	\|--------\|-------\|
	\| Inference Latency \| 91.9 ms \|
	\| Throughput \| 10.9 tokens/sec \|
	\| Cosine Similarity vs Original \| 1.0000 ✅ \|

	## Quick Start

	```python
	import numpy as np
	from ai_edge_litert.interpreter import Interpreter
	from transformers import AutoTokenizer

	# Load model and tokenizer
	interpreter = Interpreter(model_path="intfloat_multilingual-e5-small.tflite")
	interpreter.allocate_tensors()
	input_details = interpreter.get_input_details()
	output_details = interpreter.get_output_details()

	tokenizer = AutoTokenizer.from_pretrained("intfloat/multilingual-e5-small")

	def get_embedding(text: str) -> np.ndarray:
	"""Get sentence embedding for input text."""
	encoded = tokenizer(
	text,
	padding="max_length",
	max_length=512,
	truncation=True,
	return_tensors="np"
	)

	interpreter.set_tensor(input_details[0]["index"], encoded["input_ids"].astype(np.int64))
	interpreter.set_tensor(input_details[1]["index"], encoded["attention_mask"].astype(np.int64))
	interpreter.invoke()

	return interpreter.get_tensor(output_details[0]["index"])[0]

	# Example
	embedding = get_embedding("Hello, world!")
	print(f"Embedding shape: {embedding.shape}") # (384,)
	```

	## Files

	- `intfloat_multilingual-e5-small.tflite` - The LiteRT model file

	## Conversion Details

	- Conversion Tool: [ai-edge-torch](https://github.com/google-ai-edge/ai-edge-torch)
	- Conversion Date: 2026-01-12
	- Source Framework: PyTorch → LiteRT
	- Validation: Cosine similarity 1.0000 vs original

	## Intended Use

	- Mobile Applications: On-device semantic search, RAG systems
	- Edge Devices: IoT, embedded systems, Raspberry Pi
	- Offline Processing: Privacy-preserving inference
	- Low-latency Applications: Real-time processing

	## Limitations

	- Fixed sequence length (512 tokens)
	- CPU inference (GPU delegate requires setup)
	- Tokenizer loaded separately from original model
	- Float32 precision

	## License

	This model inherits the license from the original:
	- License: MIT ([source](https://huggingface.co/intfloat/multilingual-e5-small))

	## Citation

	```bibtex
	@article{wang2024multilingual,
	title={Multilingual E5 Text Embeddings: A Technical Report},
	author={Wang, Liang and Yang, Nan and Huang, Xiaolong and others},
	journal={arXiv preprint arXiv:2402.05672},
	year={2024}
	}
	```

	## Acknowledgments

	- Original model by [intfloat](https://huggingface.co/intfloat)
	- Conversion using [ai-edge-torch](https://github.com/google-ai-edge/ai-edge-torch)

	---

	Converted by [Bombek1](https://huggingface.co/Bombek1)