| --- |
| license: apache-2.0 |
| tags: |
| - onnx |
| - int8 |
| - quantized |
| - sentence-similarity |
| - embeddings |
| - justembed |
| base_model: sentence-transformers/all-mpnet-base-v2 |
| library_name: onnxruntime |
| pipeline_tag: feature-extraction |
| --- |
| |
| # MPNet INT8 β ONNX Quantized |
|
|
| ONNX INT8 quantized version of [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) for efficient general-purpose sentence embeddings. |
|
|
| ## Model Details |
|
|
| | Property | Value | |
| |----------|-------| |
| | Base Model | [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) | |
| | Format | ONNX | |
| | Quantization | INT8 (dynamic quantization) | |
| | Embedding Dimension | 768 | |
| | Quantized by | [JustEmbed](https://pypi.org/project/justembed/) | |
|
|
| ## What is this? |
|
|
| This is a quantized ONNX export of all-mpnet-base-v2, one of the best general-purpose sentence embedding models from the sentence-transformers library. It maps sentences and paragraphs to a 768-dimensional dense vector space. The INT8 quantization reduces model size and improves inference speed while maintaining high accuracy. |
|
|
| ## Use Cases |
|
|
| - Semantic text search |
| - Sentence similarity |
| - Clustering and topic modeling |
| - Paraphrase detection |
| - General-purpose text embeddings |
|
|
| ## Files |
|
|
| - `model_quantized.onnx` β INT8 quantized ONNX model |
| - `tokenizer.json` β Fast tokenizer |
| - `vocab.txt` β Vocabulary file |
| - `config.json` β Model configuration |
|
|
| ## Usage with JustEmbed |
|
|
| ```python |
| from justembed import Embedder |
| |
| embedder = Embedder("mpnet-int8") |
| vectors = embedder.embed(["This is a sentence", "This is another sentence"]) |
| ``` |
|
|
| ## Usage with ONNX Runtime |
|
|
| ```python |
| import onnxruntime as ort |
| from transformers import AutoTokenizer |
| |
| tokenizer = AutoTokenizer.from_pretrained(".") |
| session = ort.InferenceSession("model_quantized.onnx") |
| |
| inputs = tokenizer("This is a sentence", return_tensors="np") |
| outputs = session.run(None, dict(inputs)) |
| ``` |
|
|
| ## Quantization Details |
|
|
| - Method: Dynamic INT8 quantization via ONNX Runtime |
| - Source: Original PyTorch weights converted to ONNX, then quantized |
| - Speed: ~2-3x faster inference than FP32 |
| - Size: ~4x smaller than FP32 |
|
|
| ## License |
|
|
| This model is a derivative work of [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2). |
|
|
| The original model is licensed under **Apache License 2.0**. This quantized version is distributed under the same license. See the [LICENSE](LICENSE) file for the full text. |
|
|
| ## Citation |
|
|
| ```bibtex |
| @inproceedings{song2020mpnet, |
| title={MPNet: Masked and Permuted Pre-training for Language Understanding}, |
| author={Song, Kaitao and Tan, Xu and Qin, Tao and Lu, Jianfeng and Liu, Tie-Yan}, |
| booktitle={NeurIPS}, |
| year={2020} |
| } |
| ``` |
|
|
| ## Acknowledgments |
|
|
| - Original model by [UKP Lab / sentence-transformers](https://www.sbert.net/) |
| - Quantization and packaging by [JustEmbed](https://pypi.org/project/justembed/) |
|
|