--- title: Florence-2 Vision Tasks Demo emoji: 🚀 colorFrom: green colorTo: blue sdk: gradio sdk_version: 5.39.0 app_file: app.py pinned: true short_description: This is a Gradio-based demo showcasing Florence-2 license: mit --- # Florence-2 Demo: Advancing a Unified Representation for a Variety of Vision Tasks This is a Gradio-based demo showcasing **Florence-2**, a unified vision foundation model that advances the state-of-the-art in various computer vision tasks through a single, versatile architecture. ## Demo Preview ![Demo Screenshot](./image-demo.png) ## About Florence-2 Florence-2 represents a significant breakthrough in computer vision by providing a unified representation that can handle a diverse range of vision tasks including: - Object detection - Image captioning - Visual question answering - OCR (Optical Character Recognition) - Region proposal - Segmentation - And many more vision tasks The model demonstrates how a single architecture can be effectively applied across multiple vision domains, eliminating the need for task-specific models. ## Paper & Resources 📄 **CVPR 2024 Paper**: [Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks](https://openaccess.thecvf.com/content/CVPR2024/papers/Xiao_Florence-2_Advancing_a_Unified_Representation_for_a_Variety_of_Vision_CVPR_2024_paper.pdf) 🎥 **CVPR Virtual Presentation**: [https://cvpr.thecvf.com/virtual/2024/poster/30529](https://cvpr.thecvf.com/virtual/2024/poster/30529) 🖼️ **Research Poster**: [Poster.png](./Poster.png) ## Demo Features This Gradio demo allows you to: - Upload images and interact with Florence-2's various capabilities - Test different vision tasks on your own images - Experience the unified model's performance across multiple domains ## Getting Started 1. Install the required dependencies: ```bash pip install -r requirements.txt ``` 2. Run the demo: ```bash python app.py ``` 3. Open your browser and navigate to the provided local URL to start using the demo. ## References **Hugging Face Spaces**: - [Florence-2 Demo by gokaygokay](https://huggingface.co/spaces/gokaygokay/Florence-2) - [Florence-SAM Integration by SkalskiP](https://huggingface.co/spaces/SkalskiP/florence-sam) ## Citation If you use this demo or find Florence-2 useful in your research, please cite: ```bibtex @inproceedings{xiao2024florence, title={Florence-2: Advancing a unified representation for a variety of vision tasks}, author={Xiao, Bin and Wu, Haiping and Xu, Weijian and Dai, Xiyang and Hu, Houdong and Lu, Yumao and Zeng, Michael and Liu, Ce and Yuan, Lu}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, pages={4818--4829}, year={2024} } ```