Florence-2-Demo / README.md
MinhDS's picture
Update README.md
a56dfb0 verified

A newer version of the Gradio SDK is available: 5.42.0

Upgrade
metadata
title: Florence-2 Vision Tasks Demo
emoji: πŸš€
colorFrom: green
colorTo: blue
sdk: gradio
sdk_version: 5.39.0
app_file: app.py
pinned: true
short_description: This is a Gradio-based demo showcasing Florence-2
license: mit

Florence-2 Demo: Advancing a Unified Representation for a Variety of Vision Tasks

This is a Gradio-based demo showcasing Florence-2, a unified vision foundation model that advances the state-of-the-art in various computer vision tasks through a single, versatile architecture.

Demo Preview

Demo Screenshot

About Florence-2

Florence-2 represents a significant breakthrough in computer vision by providing a unified representation that can handle a diverse range of vision tasks including:

  • Object detection
  • Image captioning
  • Visual question answering
  • OCR (Optical Character Recognition)
  • Region proposal
  • Segmentation
  • And many more vision tasks

The model demonstrates how a single architecture can be effectively applied across multiple vision domains, eliminating the need for task-specific models.

Paper & Resources

πŸ“„ CVPR 2024 Paper: Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

πŸŽ₯ CVPR Virtual Presentation: https://cvpr.thecvf.com/virtual/2024/poster/30529

πŸ–ΌοΈ Research Poster: Poster.png

Demo Features

This Gradio demo allows you to:

  • Upload images and interact with Florence-2's various capabilities
  • Test different vision tasks on your own images
  • Experience the unified model's performance across multiple domains

Getting Started

  1. Install the required dependencies:

    pip install -r requirements.txt
    
  2. Run the demo:

    python app.py
    
  3. Open your browser and navigate to the provided local URL to start using the demo.

References

Hugging Face Spaces:

Citation

If you use this demo or find Florence-2 useful in your research, please cite:

@inproceedings{xiao2024florence,
  title={Florence-2: Advancing a unified representation for a variety of vision tasks},
  author={Xiao, Bin and Wu, Haiping and Xu, Weijian and Dai, Xiyang and Hu, Houdong and Lu, Yumao and Zeng, Michael and Liu, Ce and Yuan, Lu},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={4818--4829},
  year={2024}
}