Spaces:
Running
Running
| title: Florence-2 Vision Tasks Demo | |
| emoji: π | |
| colorFrom: green | |
| colorTo: blue | |
| sdk: gradio | |
| sdk_version: 5.39.0 | |
| app_file: app.py | |
| pinned: true | |
| short_description: This is a Gradio-based demo showcasing Florence-2 | |
| license: mit | |
| # Florence-2 Demo: Advancing a Unified Representation for a Variety of Vision Tasks | |
| This is a Gradio-based demo showcasing **Florence-2**, a unified vision foundation model that advances the state-of-the-art in various computer vision tasks through a single, versatile architecture. | |
| ## Demo Preview | |
|  | |
| ## About Florence-2 | |
| Florence-2 represents a significant breakthrough in computer vision by providing a unified representation that can handle a diverse range of vision tasks including: | |
| - Object detection | |
| - Image captioning | |
| - Visual question answering | |
| - OCR (Optical Character Recognition) | |
| - Region proposal | |
| - Segmentation | |
| - And many more vision tasks | |
| The model demonstrates how a single architecture can be effectively applied across multiple vision domains, eliminating the need for task-specific models. | |
| ## Paper & Resources | |
| π **CVPR 2024 Paper**: [Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks](https://openaccess.thecvf.com/content/CVPR2024/papers/Xiao_Florence-2_Advancing_a_Unified_Representation_for_a_Variety_of_Vision_CVPR_2024_paper.pdf) | |
| π₯ **CVPR Virtual Presentation**: [https://cvpr.thecvf.com/virtual/2024/poster/30529](https://cvpr.thecvf.com/virtual/2024/poster/30529) | |
| πΌοΈ **Research Poster**: [Poster.png](./Poster.png) | |
| ## Demo Features | |
| This Gradio demo allows you to: | |
| - Upload images and interact with Florence-2's various capabilities | |
| - Test different vision tasks on your own images | |
| - Experience the unified model's performance across multiple domains | |
| ## Getting Started | |
| 1. Install the required dependencies: | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| 2. Run the demo: | |
| ```bash | |
| python app.py | |
| ``` | |
| 3. Open your browser and navigate to the provided local URL to start using the demo. | |
| ## References | |
| **Hugging Face Spaces**: | |
| - [Florence-2 Demo by gokaygokay](https://huggingface.co/spaces/gokaygokay/Florence-2) | |
| - [Florence-SAM Integration by SkalskiP](https://huggingface.co/spaces/SkalskiP/florence-sam) | |
| ## Citation | |
| If you use this demo or find Florence-2 useful in your research, please cite: | |
| ```bibtex | |
| @inproceedings{xiao2024florence, | |
| title={Florence-2: Advancing a unified representation for a variety of vision tasks}, | |
| author={Xiao, Bin and Wu, Haiping and Xu, Weijian and Dai, Xiyang and Hu, Houdong and Lu, Yumao and Zeng, Michael and Liu, Ce and Yuan, Lu}, | |
| booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, | |
| pages={4818--4829}, | |
| year={2024} | |
| } | |
| ``` |