Bombek1 commited on
Commit
2959a75
·
verified ·
1 Parent(s): 86873d4

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +124 -0
README.md ADDED
@@ -0,0 +1,124 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - embeddings
5
+ - litert
6
+ - tflite
7
+ - edge
8
+ - on-device
9
+ license: mit
10
+ base_model: intfloat/multilingual-e5-small
11
+ pipeline_tag: feature-extraction
12
+ ---
13
+
14
+ # multilingual-e5-small - LiteRT
15
+
16
+ This is a [LiteRT](https://ai.google.dev/edge/litert) (formerly TensorFlow Lite) conversion of [intfloat/multilingual-e5-small](https://huggingface.co/intfloat/multilingual-e5-small) for efficient on-device inference.
17
+
18
+ ## Model Details
19
+
20
+ | Property | Value |
21
+ |----------|-------|
22
+ | **Original Model** | [intfloat/multilingual-e5-small](https://huggingface.co/intfloat/multilingual-e5-small) |
23
+ | **Format** | LiteRT (.tflite) |
24
+ | **File Size** | 449.0 MB |
25
+ | **Task** | Multilingual Sentence Embeddings (100 languages) |
26
+ | **Max Sequence Length** | 512 |
27
+ | **Output Dimension** | 384 |
28
+ | **Pooling Mode** | Mean Pooling |
29
+
30
+ ## Performance
31
+
32
+ Benchmarked on AMD CPU (WSL2):
33
+
34
+ | Metric | Value |
35
+ |--------|-------|
36
+ | **Inference Latency** | 91.9 ms |
37
+ | **Throughput** | 10.9 tokens/sec |
38
+ | **Cosine Similarity vs Original** | 1.0000 ✅ |
39
+
40
+ ## Quick Start
41
+
42
+ ```python
43
+ import numpy as np
44
+ from ai_edge_litert.interpreter import Interpreter
45
+ from transformers import AutoTokenizer
46
+
47
+ # Load model and tokenizer
48
+ interpreter = Interpreter(model_path="intfloat_multilingual-e5-small.tflite")
49
+ interpreter.allocate_tensors()
50
+ input_details = interpreter.get_input_details()
51
+ output_details = interpreter.get_output_details()
52
+
53
+ tokenizer = AutoTokenizer.from_pretrained("intfloat/multilingual-e5-small")
54
+
55
+ def get_embedding(text: str) -> np.ndarray:
56
+ """Get sentence embedding for input text."""
57
+ encoded = tokenizer(
58
+ text,
59
+ padding="max_length",
60
+ max_length=512,
61
+ truncation=True,
62
+ return_tensors="np"
63
+ )
64
+
65
+ interpreter.set_tensor(input_details[0]["index"], encoded["input_ids"].astype(np.int64))
66
+ interpreter.set_tensor(input_details[1]["index"], encoded["attention_mask"].astype(np.int64))
67
+ interpreter.invoke()
68
+
69
+ return interpreter.get_tensor(output_details[0]["index"])[0]
70
+
71
+ # Example
72
+ embedding = get_embedding("Hello, world!")
73
+ print(f"Embedding shape: {embedding.shape}") # (384,)
74
+ ```
75
+
76
+ ## Files
77
+
78
+ - `intfloat_multilingual-e5-small.tflite` - The LiteRT model file
79
+
80
+ ## Conversion Details
81
+
82
+ - **Conversion Tool**: [ai-edge-torch](https://github.com/google-ai-edge/ai-edge-torch)
83
+ - **Conversion Date**: 2026-01-12
84
+ - **Source Framework**: PyTorch → LiteRT
85
+ - **Validation**: Cosine similarity 1.0000 vs original
86
+
87
+ ## Intended Use
88
+
89
+ - **Mobile Applications**: On-device semantic search, RAG systems
90
+ - **Edge Devices**: IoT, embedded systems, Raspberry Pi
91
+ - **Offline Processing**: Privacy-preserving inference
92
+ - **Low-latency Applications**: Real-time processing
93
+
94
+ ## Limitations
95
+
96
+ - Fixed sequence length (512 tokens)
97
+ - CPU inference (GPU delegate requires setup)
98
+ - Tokenizer loaded separately from original model
99
+ - Float32 precision
100
+
101
+ ## License
102
+
103
+ This model inherits the license from the original:
104
+ - **License**: MIT ([source](https://huggingface.co/intfloat/multilingual-e5-small))
105
+
106
+ ## Citation
107
+
108
+ ```bibtex
109
+ @article{wang2024multilingual,
110
+ title={Multilingual E5 Text Embeddings: A Technical Report},
111
+ author={Wang, Liang and Yang, Nan and Huang, Xiaolong and others},
112
+ journal={arXiv preprint arXiv:2402.05672},
113
+ year={2024}
114
+ }
115
+ ```
116
+
117
+ ## Acknowledgments
118
+
119
+ - Original model by [intfloat](https://huggingface.co/intfloat)
120
+ - Conversion using [ai-edge-torch](https://github.com/google-ai-edge/ai-edge-torch)
121
+
122
+ ---
123
+
124
+ *Converted by [Bombek1](https://huggingface.co/Bombek1)*