--- license: apache-2.0 tags: - onnx - int8 - quantized - biomedical - embeddings - sentence-transformers - justembed base_model: cambridgeltl/SapBERT-from-PubMedBERT-fulltext library_name: onnxruntime pipeline_tag: feature-extraction --- # SapBERT INT8 — ONNX Quantized ONNX INT8 quantized version of [cambridgeltl/SapBERT-from-PubMedBERT-fulltext](https://huggingface.co/cambridgeltl/SapBERT-from-PubMedBERT-fulltext) for efficient biomedical entity embeddings. ## Model Details | Property | Value | |----------|-------| | Base Model | [cambridgeltl/SapBERT-from-PubMedBERT-fulltext](https://huggingface.co/cambridgeltl/SapBERT-from-PubMedBERT-fulltext) | | Format | ONNX | | Quantization | INT8 (dynamic quantization) | | Embedding Dimension | 768 | | Quantized by | [JustEmbed](https://pypi.org/project/justembed/) | ## What is this? This is a quantized ONNX export of SapBERT, a biomedical entity linking model trained on UMLS concepts. The INT8 quantization reduces model size and improves inference speed while maintaining high accuracy for biomedical text embeddings. SapBERT (Self-Alignment Pre-training for BERT) was developed by the Cambridge Language Technology Lab for biomedical entity representation learning. ## Use Cases - Medical entity linking - Biomedical concept matching - Clinical terminology normalization - Drug name standardization - Disease concept mapping ## Files - `model_quantized.onnx` — INT8 quantized ONNX model - `tokenizer.json` — Fast tokenizer - `config.json` — Model configuration ## Usage with JustEmbed ```python from justembed import Embedder embedder = Embedder("sapbert-int8") vectors = embedder.embed(["aspirin", "acetylsalicylic acid"]) ``` ## Usage with ONNX Runtime ```python import onnxruntime as ort from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained(".") session = ort.InferenceSession("model_quantized.onnx") inputs = tokenizer("aspirin", return_tensors="np") outputs = session.run(None, dict(inputs)) ``` ## Quantization Details - Method: Dynamic INT8 quantization via ONNX Runtime - Source: Original PyTorch weights converted to ONNX, then quantized - Accuracy: ~95%+ of FP32 performance on biomedical benchmarks - Speed: ~2-3x faster inference than FP32 - Size: ~4x smaller than FP32 ## License This model is a derivative work of [cambridgeltl/SapBERT-from-PubMedBERT-fulltext](https://huggingface.co/cambridgeltl/SapBERT-from-PubMedBERT-fulltext). The original model is licensed under **Apache License 2.0**. This quantized version is distributed under the same license. See the [LICENSE](LICENSE) file for the full text. ## Citation ```bibtex @inproceedings{liu2021self, title={Self-Alignment Pretraining for Biomedical Entity Representations}, author={Liu, Fangyu and Shareghi, Ehsan and Meng, Zaiqiao and Basaldella, Marco and Collier, Nigel}, booktitle={Proceedings of NAACL}, year={2021} } ``` ## Acknowledgments - Original model by the [Cambridge Language Technology Lab](https://github.com/cambridgeltl/sapbert) - Quantization and packaging by [JustEmbed](https://pypi.org/project/justembed/)