File size: 2,771 Bytes
d57efe3
 
a56dfb0
0ae99f6
d57efe3
 
6d854eb
d57efe3
 
a56dfb0
 
d57efe3
 
51914db
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d57efe3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
---
title: Florence-2 Vision Tasks Demo
emoji: πŸš€
colorFrom: green
colorTo: blue
sdk: gradio
sdk_version: 5.39.0
app_file: app.py
pinned: true
short_description: This is a Gradio-based demo showcasing Florence-2
license: mit
---

# Florence-2 Demo: Advancing a Unified Representation for a Variety of Vision Tasks

This is a Gradio-based demo showcasing **Florence-2**, a unified vision foundation model that advances the state-of-the-art in various computer vision tasks through a single, versatile architecture.

## Demo Preview

![Demo Screenshot](./image-demo.png)

## About Florence-2

Florence-2 represents a significant breakthrough in computer vision by providing a unified representation that can handle a diverse range of vision tasks including:

- Object detection
- Image captioning
- Visual question answering
- OCR (Optical Character Recognition)
- Region proposal
- Segmentation
- And many more vision tasks

The model demonstrates how a single architecture can be effectively applied across multiple vision domains, eliminating the need for task-specific models.

## Paper & Resources

πŸ“„ **CVPR 2024 Paper**: [Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks](https://openaccess.thecvf.com/content/CVPR2024/papers/Xiao_Florence-2_Advancing_a_Unified_Representation_for_a_Variety_of_Vision_CVPR_2024_paper.pdf)

πŸŽ₯ **CVPR Virtual Presentation**: [https://cvpr.thecvf.com/virtual/2024/poster/30529](https://cvpr.thecvf.com/virtual/2024/poster/30529)

πŸ–ΌοΈ **Research Poster**: [Poster.png](./Poster.png)

## Demo Features

This Gradio demo allows you to:
- Upload images and interact with Florence-2's various capabilities
- Test different vision tasks on your own images
- Experience the unified model's performance across multiple domains

## Getting Started

1. Install the required dependencies:
   ```bash
   pip install -r requirements.txt
   ```

2. Run the demo:
   ```bash
   python app.py
   ```

3. Open your browser and navigate to the provided local URL to start using the demo.

## References

**Hugging Face Spaces**:
- [Florence-2 Demo by gokaygokay](https://huggingface.co/spaces/gokaygokay/Florence-2)
- [Florence-SAM Integration by SkalskiP](https://huggingface.co/spaces/SkalskiP/florence-sam)

## Citation

If you use this demo or find Florence-2 useful in your research, please cite:

```bibtex
@inproceedings{xiao2024florence,
  title={Florence-2: Advancing a unified representation for a variety of vision tasks},
  author={Xiao, Bin and Wu, Haiping and Xu, Weijian and Dai, Xiyang and Hu, Houdong and Lu, Yumao and Zeng, Michael and Liu, Ce and Yuan, Lu},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={4818--4829},
  year={2024}
}
```