Upload folder using huggingface_hub

4a8a488 verified 3 months ago

3.14 kB

license: apache-2.0
tags:
  - onnx
  - int8
  - quantized
  - biomedical
  - embeddings
  - sentence-transformers
  - justembed
base_model: cambridgeltl/SapBERT-from-PubMedBERT-fulltext
library_name: onnxruntime
pipeline_tag: feature-extraction

SapBERT INT8 — ONNX Quantized

ONNX INT8 quantized version of cambridgeltl/SapBERT-from-PubMedBERT-fulltext for efficient biomedical entity embeddings.

Model Details

Property	Value
Base Model	cambridgeltl/SapBERT-from-PubMedBERT-fulltext
Format	ONNX
Quantization	INT8 (dynamic quantization)
Embedding Dimension	768
Quantized by	JustEmbed

What is this?

This is a quantized ONNX export of SapBERT, a biomedical entity linking model trained on UMLS concepts. The INT8 quantization reduces model size and improves inference speed while maintaining high accuracy for biomedical text embeddings.

SapBERT (Self-Alignment Pre-training for BERT) was developed by the Cambridge Language Technology Lab for biomedical entity representation learning.

Use Cases

Medical entity linking
Biomedical concept matching
Clinical terminology normalization
Drug name standardization
Disease concept mapping

Files

model_quantized.onnx — INT8 quantized ONNX model
tokenizer.json — Fast tokenizer
config.json — Model configuration

Usage with JustEmbed

from justembed import Embedder

embedder = Embedder("sapbert-int8")
vectors = embedder.embed(["aspirin", "acetylsalicylic acid"])

Usage with ONNX Runtime

import onnxruntime as ort
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(".")
session = ort.InferenceSession("model_quantized.onnx")

inputs = tokenizer("aspirin", return_tensors="np")
outputs = session.run(None, dict(inputs))

Quantization Details

Method: Dynamic INT8 quantization via ONNX Runtime
Source: Original PyTorch weights converted to ONNX, then quantized
Accuracy: ~95%+ of FP32 performance on biomedical benchmarks
Speed: ~2-3x faster inference than FP32
Size: ~4x smaller than FP32

License

This model is a derivative work of cambridgeltl/SapBERT-from-PubMedBERT-fulltext.

The original model is licensed under Apache License 2.0. This quantized version is distributed under the same license. See the LICENSE file for the full text.

Citation

@inproceedings{liu2021self,
  title={Self-Alignment Pretraining for Biomedical Entity Representations},
  author={Liu, Fangyu and Shareghi, Ehsan and Meng, Zaiqiao and Basaldella, Marco and Collier, Nigel},
  booktitle={Proceedings of NAACL},
  year={2021}
}

Acknowledgments

Original model by the Cambridge Language Technology Lab
Quantization and packaging by JustEmbed