metadata

title: Florence-2 Vision Tasks Demo
emoji: 🚀
colorFrom: green
colorTo: blue
sdk: gradio
sdk_version: 5.39.0
app_file: app.py
pinned: true
short_description: This is a Gradio-based demo showcasing Florence-2
license: mit

Florence-2 Demo: Advancing a Unified Representation for a Variety of Vision Tasks

This is a Gradio-based demo showcasing Florence-2, a unified vision foundation model that advances the state-of-the-art in various computer vision tasks through a single, versatile architecture.

Demo Preview

About Florence-2

Florence-2 represents a significant breakthrough in computer vision by providing a unified representation that can handle a diverse range of vision tasks including:

Object detection
Image captioning
Visual question answering
OCR (Optical Character Recognition)
Region proposal
Segmentation
And many more vision tasks

The model demonstrates how a single architecture can be effectively applied across multiple vision domains, eliminating the need for task-specific models.

Paper & Resources

📄 CVPR 2024 Paper: Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

🎥 CVPR Virtual Presentation: https://cvpr.thecvf.com/virtual/2024/poster/30529

🖼️ Research Poster: Poster.png

Demo Features

This Gradio demo allows you to:

Upload images and interact with Florence-2's various capabilities
Test different vision tasks on your own images
Experience the unified model's performance across multiple domains

Getting Started

Install the required dependencies:
```
pip install -r requirements.txt
```
Run the demo:
```
python app.py
```
Open your browser and navigate to the provided local URL to start using the demo.

References

Hugging Face Spaces:

Citation

If you use this demo or find Florence-2 useful in your research, please cite:

@inproceedings{xiao2024florence,
  title={Florence-2: Advancing a unified representation for a variety of vision tasks},
  author={Xiao, Bin and Wu, Haiping and Xu, Weijian and Dai, Xiyang and Hu, Houdong and Lu, Yumao and Zeng, Michael and Liu, Ce and Yuan, Lu},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={4818--4829},
  year={2024}
}