---
license: apache-2.0
tags:
  - onnx
  - int8
  - quantized
  - biomedical
  - embeddings
  - sentence-transformers
  - justembed
base_model: cambridgeltl/SapBERT-from-PubMedBERT-fulltext
library_name: onnxruntime
pipeline_tag: feature-extraction
---

# SapBERT INT8 — ONNX Quantized

ONNX INT8 quantized version of [cambridgeltl/SapBERT-from-PubMedBERT-fulltext](https://huggingface.co/cambridgeltl/SapBERT-from-PubMedBERT-fulltext) for efficient biomedical entity embeddings.

## Model Details

| Property | Value |
|----------|-------|
| Base Model | [cambridgeltl/SapBERT-from-PubMedBERT-fulltext](https://huggingface.co/cambridgeltl/SapBERT-from-PubMedBERT-fulltext) |
| Format | ONNX |
| Quantization | INT8 (dynamic quantization) |
| Embedding Dimension | 768 |
| Quantized by | [JustEmbed](https://pypi.org/project/justembed/) |

## What is this?

This is a quantized ONNX export of SapBERT, a biomedical entity linking model trained on UMLS concepts. The INT8 quantization reduces model size and improves inference speed while maintaining high accuracy for biomedical text embeddings.

SapBERT (Self-Alignment Pre-training for BERT) was developed by the Cambridge Language Technology Lab for biomedical entity representation learning.

## Use Cases

- Medical entity linking
- Biomedical concept matching
- Clinical terminology normalization
- Drug name standardization
- Disease concept mapping

## Files

- `model_quantized.onnx` — INT8 quantized ONNX model
- `tokenizer.json` — Fast tokenizer
- `config.json` — Model configuration

## Usage with JustEmbed

```python
from justembed import Embedder

embedder = Embedder("sapbert-int8")
vectors = embedder.embed(["aspirin", "acetylsalicylic acid"])
```

## Usage with ONNX Runtime

```python
import onnxruntime as ort
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(".")
session = ort.InferenceSession("model_quantized.onnx")

inputs = tokenizer("aspirin", return_tensors="np")
outputs = session.run(None, dict(inputs))
```

## Quantization Details

- Method: Dynamic INT8 quantization via ONNX Runtime
- Source: Original PyTorch weights converted to ONNX, then quantized
- Accuracy: ~95%+ of FP32 performance on biomedical benchmarks
- Speed: ~2-3x faster inference than FP32
- Size: ~4x smaller than FP32

## License

This model is a derivative work of [cambridgeltl/SapBERT-from-PubMedBERT-fulltext](https://huggingface.co/cambridgeltl/SapBERT-from-PubMedBERT-fulltext).

The original model is licensed under **Apache License 2.0**. This quantized version is distributed under the same license. See the [LICENSE](LICENSE) file for the full text.

## Citation

```bibtex
@inproceedings{liu2021self,
  title={Self-Alignment Pretraining for Biomedical Entity Representations},
  author={Liu, Fangyu and Shareghi, Ehsan and Meng, Zaiqiao and Basaldella, Marco and Collier, Nigel},
  booktitle={Proceedings of NAACL},
  year={2021}
}
```

## Acknowledgments

- Original model by the [Cambridge Language Technology Lab](https://github.com/cambridgeltl/sapbert)
- Quantization and packaging by [JustEmbed](https://pypi.org/project/justembed/)