File size: 2,962 Bytes
10b2bf6 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 | ---
license: apache-2.0
tags:
- onnx
- int8
- quantized
- sentence-similarity
- embeddings
- justembed
base_model: sentence-transformers/all-mpnet-base-v2
library_name: onnxruntime
pipeline_tag: feature-extraction
---
# MPNet INT8 — ONNX Quantized
ONNX INT8 quantized version of [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) for efficient general-purpose sentence embeddings.
## Model Details
| Property | Value |
|----------|-------|
| Base Model | [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) |
| Format | ONNX |
| Quantization | INT8 (dynamic quantization) |
| Embedding Dimension | 768 |
| Quantized by | [JustEmbed](https://pypi.org/project/justembed/) |
## What is this?
This is a quantized ONNX export of all-mpnet-base-v2, one of the best general-purpose sentence embedding models from the sentence-transformers library. It maps sentences and paragraphs to a 768-dimensional dense vector space. The INT8 quantization reduces model size and improves inference speed while maintaining high accuracy.
## Use Cases
- Semantic text search
- Sentence similarity
- Clustering and topic modeling
- Paraphrase detection
- General-purpose text embeddings
## Files
- `model_quantized.onnx` — INT8 quantized ONNX model
- `tokenizer.json` — Fast tokenizer
- `vocab.txt` — Vocabulary file
- `config.json` — Model configuration
## Usage with JustEmbed
```python
from justembed import Embedder
embedder = Embedder("mpnet-int8")
vectors = embedder.embed(["This is a sentence", "This is another sentence"])
```
## Usage with ONNX Runtime
```python
import onnxruntime as ort
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(".")
session = ort.InferenceSession("model_quantized.onnx")
inputs = tokenizer("This is a sentence", return_tensors="np")
outputs = session.run(None, dict(inputs))
```
## Quantization Details
- Method: Dynamic INT8 quantization via ONNX Runtime
- Source: Original PyTorch weights converted to ONNX, then quantized
- Speed: ~2-3x faster inference than FP32
- Size: ~4x smaller than FP32
## License
This model is a derivative work of [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2).
The original model is licensed under **Apache License 2.0**. This quantized version is distributed under the same license. See the [LICENSE](LICENSE) file for the full text.
## Citation
```bibtex
@inproceedings{song2020mpnet,
title={MPNet: Masked and Permuted Pre-training for Language Understanding},
author={Song, Kaitao and Tan, Xu and Qin, Tao and Lu, Jianfeng and Liu, Tie-Yan},
booktitle={NeurIPS},
year={2020}
}
```
## Acknowledgments
- Original model by [UKP Lab / sentence-transformers](https://www.sbert.net/)
- Quantization and packaging by [JustEmbed](https://pypi.org/project/justembed/)
|