sapbert-int8 / README.md
sekarkrishna's picture
Upload folder using huggingface_hub
4a8a488 verified
metadata
license: apache-2.0
tags:
  - onnx
  - int8
  - quantized
  - biomedical
  - embeddings
  - sentence-transformers
  - justembed
base_model: cambridgeltl/SapBERT-from-PubMedBERT-fulltext
library_name: onnxruntime
pipeline_tag: feature-extraction

SapBERT INT8 — ONNX Quantized

ONNX INT8 quantized version of cambridgeltl/SapBERT-from-PubMedBERT-fulltext for efficient biomedical entity embeddings.

Model Details

Property Value
Base Model cambridgeltl/SapBERT-from-PubMedBERT-fulltext
Format ONNX
Quantization INT8 (dynamic quantization)
Embedding Dimension 768
Quantized by JustEmbed

What is this?

This is a quantized ONNX export of SapBERT, a biomedical entity linking model trained on UMLS concepts. The INT8 quantization reduces model size and improves inference speed while maintaining high accuracy for biomedical text embeddings.

SapBERT (Self-Alignment Pre-training for BERT) was developed by the Cambridge Language Technology Lab for biomedical entity representation learning.

Use Cases

  • Medical entity linking
  • Biomedical concept matching
  • Clinical terminology normalization
  • Drug name standardization
  • Disease concept mapping

Files

  • model_quantized.onnx — INT8 quantized ONNX model
  • tokenizer.json — Fast tokenizer
  • config.json — Model configuration

Usage with JustEmbed

from justembed import Embedder

embedder = Embedder("sapbert-int8")
vectors = embedder.embed(["aspirin", "acetylsalicylic acid"])

Usage with ONNX Runtime

import onnxruntime as ort
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(".")
session = ort.InferenceSession("model_quantized.onnx")

inputs = tokenizer("aspirin", return_tensors="np")
outputs = session.run(None, dict(inputs))

Quantization Details

  • Method: Dynamic INT8 quantization via ONNX Runtime
  • Source: Original PyTorch weights converted to ONNX, then quantized
  • Accuracy: ~95%+ of FP32 performance on biomedical benchmarks
  • Speed: ~2-3x faster inference than FP32
  • Size: ~4x smaller than FP32

License

This model is a derivative work of cambridgeltl/SapBERT-from-PubMedBERT-fulltext.

The original model is licensed under Apache License 2.0. This quantized version is distributed under the same license. See the LICENSE file for the full text.

Citation

@inproceedings{liu2021self,
  title={Self-Alignment Pretraining for Biomedical Entity Representations},
  author={Liu, Fangyu and Shareghi, Ehsan and Meng, Zaiqiao and Basaldella, Marco and Collier, Nigel},
  booktitle={Proceedings of NAACL},
  year={2021}
}

Acknowledgments