metadata
license: apache-2.0
tags:
- onnx
- int8
- quantized
- finance
- embeddings
- justembed
base_model: ProsusAI/finbert
library_name: onnxruntime
pipeline_tag: feature-extraction
FinBERT INT8 — ONNX Quantized
ONNX INT8 quantized version of ProsusAI/finbert for efficient financial text embeddings.
Model Details
| Property | Value |
|---|---|
| Base Model | ProsusAI/finbert |
| Format | ONNX |
| Quantization | INT8 (dynamic quantization) |
| Embedding Dimension | 768 |
| Quantized by | JustEmbed |
What is this?
This is a quantized ONNX export of FinBERT, a BERT model further pre-trained on financial text by Prosus AI. The INT8 quantization reduces model size and improves inference speed while maintaining high accuracy for financial domain embeddings.
Use Cases
- Financial document search and retrieval
- Banking text analysis
- Financial sentiment embeddings
- SEC filing analysis
- Financial news similarity
Files
model_quantized.onnx— INT8 quantized ONNX modeltokenizer.json— Fast tokenizervocab.txt— Vocabulary fileconfig.json— Model configuration
Usage with JustEmbed
from justembed import Embedder
embedder = Embedder("finbert-int8")
vectors = embedder.embed(["quarterly earnings exceeded expectations"])
Usage with ONNX Runtime
import onnxruntime as ort
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(".")
session = ort.InferenceSession("model_quantized.onnx")
inputs = tokenizer("quarterly earnings exceeded expectations", return_tensors="np")
outputs = session.run(None, dict(inputs))
Quantization Details
- Method: Dynamic INT8 quantization via ONNX Runtime
- Source: Original PyTorch weights converted to ONNX, then quantized
- Speed: ~2-3x faster inference than FP32
- Size: ~4x smaller than FP32
License
This model is a derivative work of ProsusAI/finbert.
The original model is licensed under Apache License 2.0. This quantized version is distributed under the same license. See the LICENSE file for the full text.
Citation
@article{araci2019finbert,
title={FinBERT: Financial Sentiment Analysis with Pre-Trained Language Models},
author={Araci, Dogu},
journal={arXiv preprint arXiv:1908.10063},
year={2019}
}