Spaces:

SFEREWQW
/

114514

Runtime error

App Files Files Community

114514 / docs /en /algorithm /layout_detection.rst

SFEREWQW

Upload 395 files

18e4106 verified 20 days ago

raw

history blame contribute delete

8.11 kB

	.. _algorithm_layout_detection:

	=================
	Layout Detection Algorithm
	=================

	Introduction
	=================

	Layout detection is a fundamental task in document content extraction, aiming to locate different types of regions on a page, such as images, tables, text, and headings, to facilitate high-quality content extraction. For text and heading regions, OCR models can be used for text recognition, while table regions can be converted using table recognition models.

	Model Usage
	=================

	Layout detection supports following models：

	.. raw:: html

	<style type="text/css">
	.tg {border-collapse:collapse;border-color:#9ABAD9;border-spacing:0;}
	.tg td{background-color:#EBF5FF;border-color:#9ABAD9;border-style:solid;border-width:1px;color:#444;
	font-family:Arial, sans-serif;font-size:14px;overflow:hidden;padding:10px 5px;word-break:normal;}
	.tg th{background-color:#409cff;border-color:#9ABAD9;border-style:solid;border-width:1px;color:#fff;
	font-family:Arial, sans-serif;font-size:14px;font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
	.tg .tg-f8tz{background-color:#409cff;border-color:inherit;text-align:left;vertical-align:top}
	.tg .tg-0lax{text-align:left;vertical-align:top}
	.tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top}
	</style>
	<table class="tg"><thead>
	<tr>
	<th class="tg-0lax">Model</th>
	<th class="tg-f8tz">Description</th>
	<th class="tg-f8tz">Characteristics</th>
	<th class="tg-f8tz">Model weight</th>
	<th class="tg-f8tz">Config file</th>
	</tr></thead>
	<tbody>
	<tr>
	<td class="tg-0lax">DocLayout-YOLO</td>
	<td class="tg-0pky">Improved based on YOLO-v10：<br>1. Generate diverse pre-training data，enhance generalization ability across multiple document types<br>2. Model architecture improvement, improve perception ability on scale-varing instances<br>Details in <a href="https://github.com/opendatalab/DocLayout-YOLO" target="_blank" rel="noopener noreferrer">DocLayout-YOLO</a></td>
	<td class="tg-0pky">Speed:Fast, Accuracy:High</td>
	<td class="tg-0pky"><a href="https://huggingface.co/opendatalab/PDF-Extract-Kit-1.0/blob/main/models/Layout/YOLO/doclayout_yolo_ft.pt" target="_blank" rel="noopener noreferrer">doclayout_yolo_ft.pt</a></td>
	<td class="tg-0pky">layout_detection.yaml</td>
	</tr>
	<tr>
	<td class="tg-0lax">YOLO-v10</td>
	<td class="tg-0pky">Base YOLO-v10 model</td>
	<td class="tg-0pky">Speed:Fast, Accuracy:Moderate</td>
	<td class="tg-0pky"><a href="https://huggingface.co/opendatalab/PDF-Extract-Kit-1.0/blob/main/models/Layout/YOLO/yolov10l_ft.pt" target="_blank" rel="noopener noreferrer">yolov10l_ft.pt</a></td>
	<td class="tg-0pky">layout_detection_yolo.yaml</td>
	</tr>
	<tr>
	<td class="tg-0lax">LayoutLMv3</td>
	<td class="tg-0pky">Base LayoutLMv3 model</td>
	<td class="tg-0pky">Speed:Slow, Accuracy:High</td>
	<td class="tg-0pky"><a href="https://huggingface.co/opendatalab/PDF-Extract-Kit-1.0/tree/main/models/Layout/LayoutLMv3" target="_blank" rel="noopener noreferrer">layoutlmv3_ft</a></td>
	<td class="tg-0pky">layout_detection_layoutlmv3.yaml</td>
	</tr>
	</tbody></table>

	Once enciroment is setup, you can perform layout detection by executing ``scripts/layout_detection.py`` directly.

	Run demo

	.. code:: shell

	$ python scripts/layout_detection.py --config configs/layout_detection.yaml

	Model Configuration
	-----------------

	1. DocLayout-YOLO / YOLO-v10

	.. code:: yaml

	inputs: assets/demo/layout_detection
	outputs: outputs/layout_detection
	tasks:
	layout_detection:
	model: layout_detection_yolo
	model_config:
	img_size: 1024
	conf_thres: 0.25
	iou_thres: 0.45
	model_path: path/to/doclayout_yolo_model
	visualize: True

	- inputs/outputs: Define the input file path and the directory for visualization output.
	- tasks: Define the task type, currently only a layout detection task is included.
	- model: Specify the specific model type, e.g., layout_detection_yolo.
	- model_config: Define the model configuration.
	- img_size: Define the image long edge size; the short edge will be scaled proportionally based on the long edge, with the default long edge being 1024.
	- conf_thres: Define the confidence threshold, detecting only targets above this threshold.
	- iou_thres: Define the IoU threshold, removing targets with an overlap greater than this threshold.
	- model_path: Path to the model weights.
	- visualize: Whether to visualize the model results; visualized results will be saved in the outputs directory.


	2. layoutlmv3

	.. note::

	LayoutLMv3 cannot run directly by default. Please follow the steps below to modify the configuration:

	1. Detectron2 Environment Setup

	.. code-block:: bash

	# For Linux
	pip install https://wheels-1251341229.cos.ap-shanghai.myqcloud.com/assets/whl/detectron2/detectron2-0.6-cp310-cp310-linux_x86_64.whl

	# For macOS
	pip install https://wheels-1251341229.cos.ap-shanghai.myqcloud.com/assets/whl/detectron2/detectron2-0.6-cp310-cp310-macosx_10_9_universal2.whl

	# For Windows
	pip install https://wheels-1251341229.cos.ap-shanghai.myqcloud.com/assets/whl/detectron2/detectron2-0.6-cp310-cp310-win_amd64.whl

	2. Enable LayoutLMv3 Registration Code

	Uncomment the lines at the following links:

	- `line 2 <https://github.com/opendatalab/PDF-Extract-Kit/blob/main/pdf_extract_kit/tasks/layout_detection/__init__.py#L2>`_
	- `line 8 <https://github.com/opendatalab/PDF-Extract-Kit/blob/main/pdf_extract_kit/tasks/layout_detection/__init__.py#L8>`_

	.. code-block:: python

	from pdf_extract_kit.tasks.layout_detection.models.yolo import LayoutDetectionYOLO
	from pdf_extract_kit.tasks.layout_detection.models.layoutlmv3 import LayoutDetectionLayoutlmv3
	from pdf_extract_kit.registry.registry import MODEL_REGISTRY

	__all__ = [
	"LayoutDetectionYOLO",
	"LayoutDetectionLayoutlmv3",
	]


	.. code:: yaml

	inputs: assets/demo/layout_detection
	outputs: outputs/layout_detection
	tasks:
	layout_detection:
	model: layout_detection_layoutlmv3
	model_config:
	model_path: path/to/layoutlmv3_model

	- inputs/outputs: Define the input file path and the directory for visualization output.
	- tasks: Define the task type, currently only a layout detection task is included.
	- model: Specify the specific model type, e.g., layout_detection_layoutlmv3.
	- model_config: Define the model configuration.
	- model_path: Path to the model weights.



	Diverse Input Support
	-----------------

	The layout detection script in PDF-Extract-Kit supports input formats such as a ``single image``, a ``directory containing only image files``, a ``single PDF file``, and a ``directory containing only PDF files``.

	.. note::

	Modify the path to inputs in configs/layout_detection.yaml according to your actual data format:
	- Single image: path/to/image
	- Image directory: path/to/images
	- Single PDF file: path/to/pdf
	- PDF directory: path/to/pdfs

	.. note::
	When using PDF as input, you need to change ``predict_images`` to ``predict_pdfs`` in ``layout_detection.py``.

	.. code:: python

	# for image detection
	detection_results = model_layout_detection.predict_images(input_data, result_path)

	Change to:

	.. code:: python

	# for pdf detection
	detection_results = model_layout_detection.predict_pdfs(input_data, result_path)

	Viewing Visualization Results
	-----------------

	When ``visualize`` is set to ``True`` in the config file, the visualization results will be saved in the ``outputs`` directory.

	.. note::

	Visualization is helpful for analyzing model results, but for large-scale tasks, it is recommended to turn off visualization (set ``visualize`` to ``False`` ) to reduce memory and disk usage.