sekarkrishna
/

sapbert-int8

Feature Extraction

sentence-transformers

Model card Files Files and versions

sapbert-int8 / README.md

sekarkrishna's picture

Upload folder using huggingface_hub

4a8a488 verified 3 months ago

|

history blame contribute delete

3.14 kB

	---
	license: apache-2.0
	tags:
	- onnx
	- int8
	- quantized
	- biomedical
	- embeddings
	- sentence-transformers
	- justembed
	base_model: cambridgeltl/SapBERT-from-PubMedBERT-fulltext
	library_name: onnxruntime
	pipeline_tag: feature-extraction
	---

	# SapBERT INT8 — ONNX Quantized

	ONNX INT8 quantized version of [cambridgeltl/SapBERT-from-PubMedBERT-fulltext](https://huggingface.co/cambridgeltl/SapBERT-from-PubMedBERT-fulltext) for efficient biomedical entity embeddings.

	## Model Details

	\| Property \| Value \|
	\|----------\|-------\|
	\| Base Model \| [cambridgeltl/SapBERT-from-PubMedBERT-fulltext](https://huggingface.co/cambridgeltl/SapBERT-from-PubMedBERT-fulltext) \|
	\| Format \| ONNX \|
	\| Quantization \| INT8 (dynamic quantization) \|
	\| Embedding Dimension \| 768 \|
	\| Quantized by \| [JustEmbed](https://pypi.org/project/justembed/) \|

	## What is this?

	This is a quantized ONNX export of SapBERT, a biomedical entity linking model trained on UMLS concepts. The INT8 quantization reduces model size and improves inference speed while maintaining high accuracy for biomedical text embeddings.

	SapBERT (Self-Alignment Pre-training for BERT) was developed by the Cambridge Language Technology Lab for biomedical entity representation learning.

	## Use Cases

	- Medical entity linking
	- Biomedical concept matching
	- Clinical terminology normalization
	- Drug name standardization
	- Disease concept mapping

	## Files

	- `model_quantized.onnx` — INT8 quantized ONNX model
	- `tokenizer.json` — Fast tokenizer
	- `config.json` — Model configuration

	## Usage with JustEmbed

	```python
	from justembed import Embedder

	embedder = Embedder("sapbert-int8")
	vectors = embedder.embed(["aspirin", "acetylsalicylic acid"])
	```

	## Usage with ONNX Runtime

	```python
	import onnxruntime as ort
	from transformers import AutoTokenizer

	tokenizer = AutoTokenizer.from_pretrained(".")
	session = ort.InferenceSession("model_quantized.onnx")

	inputs = tokenizer("aspirin", return_tensors="np")
	outputs = session.run(None, dict(inputs))
	```

	## Quantization Details

	- Method: Dynamic INT8 quantization via ONNX Runtime
	- Source: Original PyTorch weights converted to ONNX, then quantized
	- Accuracy: ~95%+ of FP32 performance on biomedical benchmarks
	- Speed: ~2-3x faster inference than FP32
	- Size: ~4x smaller than FP32

	## License

	This model is a derivative work of [cambridgeltl/SapBERT-from-PubMedBERT-fulltext](https://huggingface.co/cambridgeltl/SapBERT-from-PubMedBERT-fulltext).

	The original model is licensed under Apache License 2.0. This quantized version is distributed under the same license. See the [LICENSE](LICENSE) file for the full text.

	## Citation

	```bibtex
	@inproceedings{liu2021self,
	title={Self-Alignment Pretraining for Biomedical Entity Representations},
	author={Liu, Fangyu and Shareghi, Ehsan and Meng, Zaiqiao and Basaldella, Marco and Collier, Nigel},
	booktitle={Proceedings of NAACL},
	year={2021}
	}
	```

	## Acknowledgments

	- Original model by the [Cambridge Language Technology Lab](https://github.com/cambridgeltl/sapbert)
	- Quantization and packaging by [JustEmbed](https://pypi.org/project/justembed/)