sekarkrishna
/

mpnet-int8

Feature Extraction

sentence-similarity

Model card Files Files and versions

mpnet-int8 / README.md

sekarkrishna's picture

Upload folder using huggingface_hub

10b2bf6 verified 2 months ago

|

history blame contribute delete

2.96 kB

	---
	license: apache-2.0
	tags:
	- onnx
	- int8
	- quantized
	- sentence-similarity
	- embeddings
	- justembed
	base_model: sentence-transformers/all-mpnet-base-v2
	library_name: onnxruntime
	pipeline_tag: feature-extraction
	---

	# MPNet INT8 — ONNX Quantized

	ONNX INT8 quantized version of [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) for efficient general-purpose sentence embeddings.

	## Model Details

	\| Property \| Value \|
	\|----------\|-------\|
	\| Base Model \| [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) \|
	\| Format \| ONNX \|
	\| Quantization \| INT8 (dynamic quantization) \|
	\| Embedding Dimension \| 768 \|
	\| Quantized by \| [JustEmbed](https://pypi.org/project/justembed/) \|

	## What is this?

	This is a quantized ONNX export of all-mpnet-base-v2, one of the best general-purpose sentence embedding models from the sentence-transformers library. It maps sentences and paragraphs to a 768-dimensional dense vector space. The INT8 quantization reduces model size and improves inference speed while maintaining high accuracy.

	## Use Cases

	- Semantic text search
	- Sentence similarity
	- Clustering and topic modeling
	- Paraphrase detection
	- General-purpose text embeddings

	## Files

	- `model_quantized.onnx` — INT8 quantized ONNX model
	- `tokenizer.json` — Fast tokenizer
	- `vocab.txt` — Vocabulary file
	- `config.json` — Model configuration

	## Usage with JustEmbed

	```python
	from justembed import Embedder

	embedder = Embedder("mpnet-int8")
	vectors = embedder.embed(["This is a sentence", "This is another sentence"])
	```

	## Usage with ONNX Runtime

	```python
	import onnxruntime as ort
	from transformers import AutoTokenizer

	tokenizer = AutoTokenizer.from_pretrained(".")
	session = ort.InferenceSession("model_quantized.onnx")

	inputs = tokenizer("This is a sentence", return_tensors="np")
	outputs = session.run(None, dict(inputs))
	```

	## Quantization Details

	- Method: Dynamic INT8 quantization via ONNX Runtime
	- Source: Original PyTorch weights converted to ONNX, then quantized
	- Speed: ~2-3x faster inference than FP32
	- Size: ~4x smaller than FP32

	## License

	This model is a derivative work of [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2).

	The original model is licensed under Apache License 2.0. This quantized version is distributed under the same license. See the [LICENSE](LICENSE) file for the full text.

	## Citation

	```bibtex
	@inproceedings{song2020mpnet,
	title={MPNet: Masked and Permuted Pre-training for Language Understanding},
	author={Song, Kaitao and Tan, Xu and Qin, Tao and Lu, Jianfeng and Liu, Tie-Yan},
	booktitle={NeurIPS},
	year={2020}
	}
	```

	## Acknowledgments

	- Original model by [UKP Lab / sentence-transformers](https://www.sbert.net/)
	- Quantization and packaging by [JustEmbed](https://pypi.org/project/justembed/)