Feature Extraction
ONNX
sentence-transformers
onnxruntime
bert
int8
quantized
biomedical
embeddings
justembed
Instructions to use sekarkrishna/sapbert-int8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use sekarkrishna/sapbert-int8 with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("sekarkrishna/sapbert-int8") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| tags: | |
| - onnx | |
| - int8 | |
| - quantized | |
| - biomedical | |
| - embeddings | |
| - sentence-transformers | |
| - justembed | |
| base_model: cambridgeltl/SapBERT-from-PubMedBERT-fulltext | |
| library_name: onnxruntime | |
| pipeline_tag: feature-extraction | |
| # SapBERT INT8 — ONNX Quantized | |
| ONNX INT8 quantized version of [cambridgeltl/SapBERT-from-PubMedBERT-fulltext](https://huggingface.co/cambridgeltl/SapBERT-from-PubMedBERT-fulltext) for efficient biomedical entity embeddings. | |
| ## Model Details | |
| | Property | Value | | |
| |----------|-------| | |
| | Base Model | [cambridgeltl/SapBERT-from-PubMedBERT-fulltext](https://huggingface.co/cambridgeltl/SapBERT-from-PubMedBERT-fulltext) | | |
| | Format | ONNX | | |
| | Quantization | INT8 (dynamic quantization) | | |
| | Embedding Dimension | 768 | | |
| | Quantized by | [JustEmbed](https://pypi.org/project/justembed/) | | |
| ## What is this? | |
| This is a quantized ONNX export of SapBERT, a biomedical entity linking model trained on UMLS concepts. The INT8 quantization reduces model size and improves inference speed while maintaining high accuracy for biomedical text embeddings. | |
| SapBERT (Self-Alignment Pre-training for BERT) was developed by the Cambridge Language Technology Lab for biomedical entity representation learning. | |
| ## Use Cases | |
| - Medical entity linking | |
| - Biomedical concept matching | |
| - Clinical terminology normalization | |
| - Drug name standardization | |
| - Disease concept mapping | |
| ## Files | |
| - `model_quantized.onnx` — INT8 quantized ONNX model | |
| - `tokenizer.json` — Fast tokenizer | |
| - `config.json` — Model configuration | |
| ## Usage with JustEmbed | |
| ```python | |
| from justembed import Embedder | |
| embedder = Embedder("sapbert-int8") | |
| vectors = embedder.embed(["aspirin", "acetylsalicylic acid"]) | |
| ``` | |
| ## Usage with ONNX Runtime | |
| ```python | |
| import onnxruntime as ort | |
| from transformers import AutoTokenizer | |
| tokenizer = AutoTokenizer.from_pretrained(".") | |
| session = ort.InferenceSession("model_quantized.onnx") | |
| inputs = tokenizer("aspirin", return_tensors="np") | |
| outputs = session.run(None, dict(inputs)) | |
| ``` | |
| ## Quantization Details | |
| - Method: Dynamic INT8 quantization via ONNX Runtime | |
| - Source: Original PyTorch weights converted to ONNX, then quantized | |
| - Accuracy: ~95%+ of FP32 performance on biomedical benchmarks | |
| - Speed: ~2-3x faster inference than FP32 | |
| - Size: ~4x smaller than FP32 | |
| ## License | |
| This model is a derivative work of [cambridgeltl/SapBERT-from-PubMedBERT-fulltext](https://huggingface.co/cambridgeltl/SapBERT-from-PubMedBERT-fulltext). | |
| The original model is licensed under **Apache License 2.0**. This quantized version is distributed under the same license. See the [LICENSE](LICENSE) file for the full text. | |
| ## Citation | |
| ```bibtex | |
| @inproceedings{liu2021self, | |
| title={Self-Alignment Pretraining for Biomedical Entity Representations}, | |
| author={Liu, Fangyu and Shareghi, Ehsan and Meng, Zaiqiao and Basaldella, Marco and Collier, Nigel}, | |
| booktitle={Proceedings of NAACL}, | |
| year={2021} | |
| } | |
| ``` | |
| ## Acknowledgments | |
| - Original model by the [Cambridge Language Technology Lab](https://github.com/cambridgeltl/sapbert) | |
| - Quantization and packaging by [JustEmbed](https://pypi.org/project/justembed/) | |