Florence-2 is a unified vision foundation model that leverages prompt-based learning to perform a wide range of vision and vision-language tasks using a single architecture and training framework.

Original paper: Advancing a Unified Representation for a Variety of Vision Tasks

Florence-2-base

This model uses the Florence-2 Base variant, which provides a balance between accuracy and computational efficiency while supporting multiple tasks through natural language prompts. It is well suited for applications such as image captioning, visual question answering, object detection, grounding, and general-purpose vision understanding.

Model Configuration:

Reference implementation: Florence-2
Original Weight: Florence-2-base
Resolution: 3x768x768 (3x384x384 on CV75)
Support Cooper version:
- Cooper SDK: [2.5.4]
- Cooper Foundry: [2.3]

Model	Device	compression	Model Link
Florence-2-base	N1-655	8-bit weights	Model_Link
Florence-2-base	CV7	8-bit weights	Model_Link
Florence-2-base	CV72	8-bit weights	Model_Link
Florence-2-base	CV75	8-bit weights	Model_Link

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for Ambarella/Florence2

Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

Paper • 2311.06242 • Published Nov 10, 2023 • 97