|
--- |
|
tags: |
|
- object-detection |
|
- transformers |
|
- detr |
|
- dab-detr |
|
- biomedical-image-processing |
|
license: apache-2.0 |
|
library_name: transformers |
|
pipeline_tag: object-detection |
|
--- |
|
|
|
# DAB-DETR for Biomedical Subfigure Extraction |
|
|
|
A transformer-based object detection model designed to detect and extract subfigures (panels) from compound figures in biomedical literature. |
|
|
|
|
|
<p align="center"> |
|
<strong>Paper</strong>: <a href="https://arxiv.org/abs/2506.02738" target="_blank">Arxiv</a> |
|
| |
|
<strong>Code</strong>: <a href="https://github.com/vectorInstitute/pmc-data-extraction" target="_blank">GitHub</a> |
|
</p> |
|
|
|
|
|
## Background & Motivation |
|
|
|
- **DAB-DETR** ("Dynamic Anchor Boxes are Better Queries for DETR") improves upon the original DETR framework by replacing learned positional queries with dynamic anchor boxes—each defined by coordinates `(x, y, w, h)`—that are updated layer by layer to guide cross-attention and speed convergence. This formulation offers explicit positional priors for improved feature matching and demonstrated strong performance. |
|
|
|
- In the **Open-PMC-18M** study, this model was adapted and trained on a synthetic dataset of 500,000 biomedical compound figures to extract subfigures at scale. The model achieved impressive results—mAP of 98.58 % and F1 of 99.96 % on a synthetic holdout, and strong performance on the ImageCLEF 2016 benchmark (mAP 36.88 %, F1 73.55 %). |
|
|
|
--- |
|
|
|
|
|
|
|
## Usage Example |
|
|
|
```python |
|
from transformers import AutoModelForObjectDetection, AutoImageProcessor |
|
from PIL import Image |
|
import torch |
|
|
|
model_name = "vector-institute/pmc-18m-dab-detr" |
|
processor = AutoImageProcessor.from_pretrained(model_name) |
|
model = AutoModelForObjectDetection.from_pretrained(model_name) |
|
|
|
image = Image.open("compound_figure.png").convert("RGB") |
|
inputs = processor(images=image, return_tensors="pt") |
|
|
|
with torch.no_grad(): |
|
outputs = model(**inputs) |
|
|
|
# Post-process detections—adjust thresholding and formatting as needed |
|
results = processor.post_process_object_detection(outputs, target_sizes=torch.tensor([image.size[::-1]]), threshold=0.5) |
|
|
|
for res in results: |
|
for score, label, box in zip(res["scores"], res["labels"], res["boxes"]): |
|
print(f"Label {label}: {score:.2f}, Box: {box.tolist()}") |
|
``` |
|
|
|
## Citation |
|
|
|
If you find this model or code useful, please consider citing: |
|
|
|
```bib |
|
@article{baghbanzadeh2025openpmc18m, |
|
title = {Open-PMC-18M: A High-Fidelity Large Scale Medical Dataset for Multimodal Representation Learning}, |
|
author = {Baghbanzadeh, Negin and Ashkezari, Sajad and Dolatabadi, Elham and Afkanpour, Arash}, |
|
journal = {arXiv preprint arXiv:2506.02738}, |
|
year = {2025} |
|
} |
|
``` |