sekarkrishna
/

finbert-int8

Feature Extraction

Model card Files Files and versions

finbert-int8 / README.md

sekarkrishna's picture

Upload folder using huggingface_hub

e9346c5 verified 2 months ago

|

history blame contribute delete

2.68 kB

	---
	license: apache-2.0
	tags:
	- onnx
	- int8
	- quantized
	- finance
	- embeddings
	- justembed
	base_model: ProsusAI/finbert
	library_name: onnxruntime
	pipeline_tag: feature-extraction
	---

	# FinBERT INT8 — ONNX Quantized

	ONNX INT8 quantized version of [ProsusAI/finbert](https://huggingface.co/ProsusAI/finbert) for efficient financial text embeddings.

	## Model Details

	\| Property \| Value \|
	\|----------\|-------\|
	\| Base Model \| [ProsusAI/finbert](https://huggingface.co/ProsusAI/finbert) \|
	\| Format \| ONNX \|
	\| Quantization \| INT8 (dynamic quantization) \|
	\| Embedding Dimension \| 768 \|
	\| Quantized by \| [JustEmbed](https://pypi.org/project/justembed/) \|

	## What is this?

	This is a quantized ONNX export of FinBERT, a BERT model further pre-trained on financial text by Prosus AI. The INT8 quantization reduces model size and improves inference speed while maintaining high accuracy for financial domain embeddings.

	## Use Cases

	- Financial document search and retrieval
	- Banking text analysis
	- Financial sentiment embeddings
	- SEC filing analysis
	- Financial news similarity

	## Files

	- `model_quantized.onnx` — INT8 quantized ONNX model
	- `tokenizer.json` — Fast tokenizer
	- `vocab.txt` — Vocabulary file
	- `config.json` — Model configuration

	## Usage with JustEmbed

	```python
	from justembed import Embedder

	embedder = Embedder("finbert-int8")
	vectors = embedder.embed(["quarterly earnings exceeded expectations"])
	```

	## Usage with ONNX Runtime

	```python
	import onnxruntime as ort
	from transformers import AutoTokenizer

	tokenizer = AutoTokenizer.from_pretrained(".")
	session = ort.InferenceSession("model_quantized.onnx")

	inputs = tokenizer("quarterly earnings exceeded expectations", return_tensors="np")
	outputs = session.run(None, dict(inputs))
	```

	## Quantization Details

	- Method: Dynamic INT8 quantization via ONNX Runtime
	- Source: Original PyTorch weights converted to ONNX, then quantized
	- Speed: ~2-3x faster inference than FP32
	- Size: ~4x smaller than FP32

	## License

	This model is a derivative work of [ProsusAI/finbert](https://huggingface.co/ProsusAI/finbert).

	The original model is licensed under Apache License 2.0. This quantized version is distributed under the same license. See the [LICENSE](LICENSE) file for the full text.

	## Citation

	```bibtex
	@article{araci2019finbert,
	title={FinBERT: Financial Sentiment Analysis with Pre-Trained Language Models},
	author={Araci, Dogu},
	journal={arXiv preprint arXiv:1908.10063},
	year={2019}
	}
	```

	## Acknowledgments

	- Original model by [Prosus AI](https://github.com/ProsusAI/finBERT)
	- Quantization and packaging by [JustEmbed](https://pypi.org/project/justembed/)