Florence-2-Demo / README.md
MinhDS's picture
Update README.md
72c013e verified
|
raw
history blame
2.7 kB

Florence-2 Demo: Advancing a Unified Representation for a Variety of Vision Tasks

This is a Gradio-based demo showcasing Florence-2, a unified vision foundation model that advances the state-of-the-art in various computer vision tasks through a single, versatile architecture.

Demo Preview

Demo Screenshot

About Florence-2

Florence-2 represents a significant breakthrough in computer vision by providing a unified representation that can handle a diverse range of vision tasks including:

  • Object detection
  • Image captioning
  • Visual question answering
  • OCR (Optical Character Recognition)
  • Region proposal
  • Segmentation
  • And many more vision tasks

The model demonstrates how a single architecture can be effectively applied across multiple vision domains, eliminating the need for task-specific models.

Paper & Resources

πŸ“„ CVPR 2024 Paper: Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

πŸŽ₯ CVPR Virtual Presentation: https://cvpr.thecvf.com/virtual/2024/poster/30529

πŸ–ΌοΈ Research Poster: Poster.png

Demo Features

This Gradio demo allows you to:

  • Upload images and interact with Florence-2's various capabilities
  • Test different vision tasks on your own images
  • Experience the unified model's performance across multiple domains

Getting Started

  1. Install the required dependencies:

    pip install -r requirements.txt
    
  2. Run the demo:

    python app.py
    
  3. Open your browser and navigate to the provided local URL to start using the demo.

References

Hugging Face Spaces:

Citation

If you use this demo or find Florence-2 useful in your research, please cite:

@inproceedings{xiao2024florence,
  title={Florence-2: Advancing a unified representation for a variety of vision tasks},
  author={Xiao, Bin and Wu, Haiping and Xu, Weijian and Dai, Xiyang and Hu, Houdong and Lu, Yumao and Zeng, Michael and Liu, Ce and Yuan, Lu},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={4818--4829},
  year={2024}
}

title: Florence-2 Vision Tasks Demo emoji: 🧠 colorFrom: indigo colorTo: violet sdk: gradio sdk_version: "4.25.0" app_file: app.py pinned: true