Feature Extraction
sentence-transformers
ONNX
Transformers
fastText
sentence-embeddings
sentence-similarity
semantic-search
vector-search
retrieval-augmented-generation
multilingual
cross-lingual
low-resource
merged-model
combined-model
tokenizer-embedded
tokenizer-integrated
standalone
all-in-one
quantized
int8
int8-quantization
optimized
efficient
fast-inference
low-latency
lightweight
small-model
edge-ready
arm64
edge-device
mobile-device
on-device
mobile-inference
tablet
smartphone
embedded-ai
onnx-runtime
onnx-model
MiniLM
MiniLM-L12-v2
paraphrase
usecase-ready
plug-and-play
production-ready
deployment-ready
real-time
distiluse
| license: mit | |
| base_model: | |
| - Xenova/distiluse-base-multilingual-cased-v2 | |
| pipeline_tag: feature-extraction | |
| tags: | |
| - feature-extraction | |
| - sentence-embeddings | |
| - sentence-transformers | |
| - sentence-similarity | |
| - semantic-search | |
| - vector-search | |
| - retrieval-augmented-generation | |
| - multilingual | |
| - cross-lingual | |
| - low-resource | |
| - merged-model | |
| - combined-model | |
| - tokenizer-embedded | |
| - tokenizer-integrated | |
| - standalone | |
| - all-in-one | |
| - quantized | |
| - int8 | |
| - int8-quantization | |
| - optimized | |
| - efficient | |
| - fast-inference | |
| - low-latency | |
| - lightweight | |
| - small-model | |
| - edge-ready | |
| - arm64 | |
| - edge-device | |
| - mobile-device | |
| - on-device | |
| - mobile-inference | |
| - tablet | |
| - smartphone | |
| - embedded-ai | |
| - onnx | |
| - onnx-runtime | |
| - onnx-model | |
| - transformers | |
| - MiniLM | |
| - MiniLM-L12-v2 | |
| - paraphrase | |
| - usecase-ready | |
| - plug-and-play | |
| - production-ready | |
| - deployment-ready | |
| - real-time | |
| - fasttext | |
| - distiluse | |
| # π§ Unified Multilingual Distiluse Text Embedder (ONNX + Tokenizer Merged) | |
| This is a highly optimized, quantized, and fully standalone model for **generating sentence embeddings** from **multilingual text**, including Ukrainian, English, Polish, and more. | |
| Built upon `distiluse-base-multilingual-cased-v2`, the model has been: | |
| - π **Merged with its tokenizer** into a single ONNX file | |
| - βοΈ **Extended with a custom preprocessing layer** | |
| - β‘ **Quantized to INT8** and ARM64-ready | |
| - π§ͺ **Extensively tested across real-world NLP tasks** | |
| - π οΈ **Bug-fixed** vs the original `sentence-transformers` quantized version that produced inaccurate cosine similarity | |
| --- | |
| ## π Key Features | |
| - π§© **Single-file architecture**: no need for external tokenizer, vocab, or `transformers` library. | |
| - β‘ **93% faster inference** on mobile compared to the original model. | |
| - π£οΈ **Multilingual**: robust across many languages, including low-resource ones. | |
| - π§ **Output = pure embeddings**: pass a string, get a 768-dim vector. Thatβs it. | |
| - π οΈ **Ready for production**: small, fast, accurate, and easy to integrate. | |
| - π± **Ideal for edge-AI, mobile, and offline scenarios.** | |
| --- | |
| π€ Author | |
| @vlad-m-dev Built for edge-ai/phone/tablet offline | |
| Telegram: https://t.me/dwight_schrute_engineer | |
| --- | |
| ## π Python Example | |
| ```python | |
| import numpy as np | |
| import onnxruntime as ort | |
| from onnxruntime_extensions import get_library_path | |
| sess_options = ort.SessionOptions() | |
| sess_options.register_custom_ops_library(get_library_path()) | |
| session = ort.InferenceSession( | |
| 'model.onnx', | |
| sess_options=sess_options, | |
| providers=['CPUExecutionProvider'] | |
| ) | |
| input_feed = {"text": np.asarray(['something..'])} | |
| outputs = session.run(None, input_feed) | |
| embedding = outputs[0] | |
| ``` | |
| --- | |
| ## π JS Example | |
| ```JavaScript | |
| const session = await InferenceSession.create(EMBEDDING_FULL_MODEL_PATH); | |
| const inputTensor = new Tensor('string', ['something..'], [1]); | |
| const feeds = { text: inputTensor }; | |
| const outputMap = await session.run(feeds); | |
| const embedding = outputMap.text_embedding.data; |