File size: 3,565 Bytes
18e4106 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 |
.. _algorithm_formula_detection:
====================
Formula Detection Algorithm
====================
Introduction
====================
Formula detection involves identifying the positions of all formulas (including inline and block formulas) in a given input image.
.. note::
Formula detection is technically a subtask of layout detection. However, due to its complexity, we recommend using a dedicated formula detection model to decouple it. This approach typically makes data annotation easier and improves detection performance.
Model Usage
====================
With the environment properly set up, simply run the layout detection algorithm script by executing ``scripts/formula_detection.py``.
.. code:: shell
$ python scripts/formula_detection.py --config configs/formula_detection.yaml
Model Configuration
--------------------
.. code:: yaml
inputs: assets/demo/formula_detection
outputs: outputs/formula_detection
tasks:
formula_detection:
model: formula_detection_yolo
model_config:
img_size: 1280
conf_thres: 0.25
iou_thres: 0.45
batch_size: 1
model_path: models/MFD/yolov8/weights.pt
visualize: True
- inputs/outputs: Define the input file path and the visualization output directory, respectively.
- tasks: Define the task type, currently only a formula detection task is included.
- model: Define the specific model type: currently, only the YOLO formula detection model is available.
- model_config: Define the model configuration.
- img_size: Define the image's longer side size; the shorter side will be scaled proportionally.
- conf_thres: Define the confidence threshold; only targets above this threshold will be detected.
- iou_thres: Define the IoU threshold to remove targets with an overlap greater than this value.
- batch_size: Define the batch size; the number of images inferred simultaneously. Generally, the larger the batch size, the faster the inference speed. A better GPU allows for a larger batch size.
- model_path: Path to the model weights.
- visualize: Whether to visualize the model results. Visualized results will be saved in the outputs directory.
Diverse Input Support
--------------------
The formula detection script in PDF-Extract-Kit supports various input formats such as ``a single image``, ``a directory of image files``, ``a single PDF file``, and ``a directory of PDF files``.
.. note::
Modify the ``inputs`` path in ``configs/formula_detection.yaml`` according to your actual data format:
- Single image: path/to/image
- Image directory: path/to/images
- Single PDF file: path/to/pdf
- PDF directory: path/to/pdfs
.. note::
When using a PDF as input, you need to change ``predict_images`` to ``predict_pdfs`` in ``formula_detection.py``.
.. code:: python
# for image detection
detection_results = model_formula_detection.predict_images(input_data, result_path)
Change to:
.. code:: python
# for pdf detection
detection_results = model_formula_detection.predict_pdfs(input_data, result_path)
Viewing Visualization Results
--------------------
When the ``visualize`` option in the config file is set to ``True``, visualization results will be saved in the ``outputs/formula_detection`` directory.
.. note::
Visualization facilitates the analysis of model results. However, for large-scale tasks, it is recommended to disable visualization (set ``visualize`` to ``False`` ) to reduce memory and disk usage. |