Instructions to use Heliosoph/florence-2-base-ft-quantized-onnx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Heliosoph/florence-2-base-ft-quantized-onnx with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "image-to-text" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("image-to-text", model="Heliosoph/florence-2-base-ft-quantized-onnx")# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("Heliosoph/florence-2-base-ft-quantized-onnx") model = AutoModelForImageTextToText.from_pretrained("Heliosoph/florence-2-base-ft-quantized-onnx") - Notebooks
- Google Colab
- Kaggle
Florence-2 base-ft β ONNX (INT8 dynamic-quantized)
ONNX export of microsoft/Florence-2-base-ft with post-export INT8 dynamic quantization applied to all four sub-models. Roughly half the disk and inference RAM of the fp16 variant.
Converted artifact. Training credit: Microsoft Research.
What this repo contains
config.json
generation_config.json
preprocessor_config.json
tokenizer.json
tokenizer_config.json
vocab.json
merges.txt
special_tokens_map.json
vision_encoder_quantized.onnx
encoder_model_quantized.onnx
decoder_model_quantized.onnx
embed_tokens_quantized.onnx
Total: ~270 MB. All four ONNX files are required at inference.
How it was produced
- Export to fp32 ONNX:
optimum-cli export onnx \ --model microsoft/Florence-2-base-ft \ --task image-to-text \ --trust-remote-code \ <fp32-output> - Apply dynamic INT8 quantization to each
.onnxfile usingonnxruntime.quantization.quantize_dynamic(weight-only, per-channel).
Toolchain: optimum 1.24.0, transformers 4.45.2, onnxruntime 1.19.x.
When to pick quantized vs fp16
This repo (INT8): CPU, NPU (OpenVINO EP), mobile, browser (transformers.js). ~270 MB.
Heliosoph/florence-2-base-ft-fp16-onnx: GPU, maximum quality. ~520 MB.
Known degradation: Dense OCR over small text loses noticeable accuracy at INT8. Captioning and object detection are largely unaffected. Test on your workload before committing.
Task prompts
Identical to the fp16 variant β see Heliosoph/florence-2-base-ft-fp16-onnx for the full list.
License
MIT β same as upstream. LICENSE file included.
- Downloads last month
- 13
Model tree for Heliosoph/florence-2-base-ft-quantized-onnx
Base model
microsoft/Florence-2-base-ft