File size: 2,421 Bytes
7cfbdd3
 
 
 
 
 
 
a9991c3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a9c70bd
 
a9991c3
 
1e9b207
a9991c3
 
1e9b207
 
 
 
 
 
a9991c3
 
 
 
 
 
 
 
1e9b207
a9991c3
 
 
 
 
 
 
 
1e9b207
 
a9991c3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7cfbdd3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
---
language:
- en
base_model:
- ds4sd/docling-models
pipeline_tag: object-detection
---
# Docling Model for Layout

This is the **Docling model for layout detection**, designed to facilitate easy importing and usage like any other Hugging Face model.

This model is part of the [Docling repository](https://huggingface.co/ds4sd/docling-models), which provides document layout analysis tools.

## **Usage Example**
Here's how you can load and use the model:

```python
import torch
from PIL import Image
from transformers import RTDetrForObjectDetection, RTDetrImageProcessor

# Load the model and processor
image_processor = RTDetrImageProcessor.from_pretrained("HuggingPanda/docling-layout")
model = RTDetrForObjectDetection.from_pretrained("HuggingPanda/docling-layout")

# Load an image
image = Image.open("hocr_output_page-0001.jpg")

# Preprocess the image
resize = {"height":640, "width":640}
inputs = image_processor(
    images=image,
    return_tensors="pt",
    size=resize,
)

# Perform inference
with torch.no_grad():
    outputs = model(**inputs)

# Post-process results
results = image_processor.post_process_object_detection(
    outputs, 
    target_sizes=torch.tensor([image.size[::-1]]), 
    threshold=0.3
)

# Print detected objects
for result in results:
    for score, label_id, box in zip(result["scores"], result["labels"], result["boxes"]):
        score, label = score.item(), label_id.item()
        box = [round(i, 2) for i in box.tolist()]
        print(f"{model.config.id2label[label+1]}: {score:.2f} {box}")

```


## **Model Information**
- **Base Model:** RT-DETR (Robust Transformer-based Object Detector)
- **Intended Use:** Layout detection for documents
- **Framework:** [Hugging Face Transformers](https://huggingface.co/docs/transformers/index)
- **Dataset Used:** Internal dataset for document structure recognition
- **License:** Apache 2.0

## **Citing This Model**
If you use this model in your work, please cite the main **Docling repository**:

```
@misc{docling2024, title={Docling Models for Document Layout Analysis}, author={DS4SD Team}, year={2024}, howpublished={Hugging Face Repository}, url={https://huggingface.co/ds4sd/docling-models} }
```

For more details, visit the main repo: [ds4sd/docling-models](https://huggingface.co/ds4sd/docling-models).

## **Contact**
For questions or issues, please open a discussion on Hugging Face or contact [pandahd75@gmail.com].