File size: 3,565 Bytes
18e4106
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
..  _algorithm_formula_detection:

====================
Formula Detection Algorithm
====================

Introduction
====================

Formula detection involves identifying the positions of all formulas (including inline and block formulas) in a given input image.

.. note::

   Formula detection is technically a subtask of layout detection. However, due to its complexity, we recommend using a dedicated formula detection model to decouple it. This approach typically makes data annotation easier and improves detection performance.

Model Usage
====================

With the environment properly set up, simply run the layout detection algorithm script by executing ``scripts/formula_detection.py``.

.. code:: shell

   $ python scripts/formula_detection.py --config configs/formula_detection.yaml

Model Configuration
--------------------

.. code:: yaml

   inputs: assets/demo/formula_detection
   outputs: outputs/formula_detection
   tasks:
      formula_detection:
         model: formula_detection_yolo
         model_config:
            img_size: 1280
            conf_thres: 0.25
            iou_thres: 0.45
            batch_size: 1
            model_path: models/MFD/yolov8/weights.pt
            visualize: True

- inputs/outputs: Define the input file path and the visualization output directory, respectively.
- tasks: Define the task type, currently only a formula detection task is included.
- model: Define the specific model type: currently, only the YOLO formula detection model is available.
- model_config: Define the model configuration.
- img_size: Define the image's longer side size; the shorter side will be scaled proportionally.
- conf_thres: Define the confidence threshold; only targets above this threshold will be detected.
- iou_thres: Define the IoU threshold to remove targets with an overlap greater than this value.
- batch_size: Define the batch size; the number of images inferred simultaneously. Generally, the larger the batch size, the faster the inference speed. A better GPU allows for a larger batch size.
- model_path: Path to the model weights.
- visualize: Whether to visualize the model results. Visualized results will be saved in the outputs directory.

Diverse Input Support
--------------------

The formula detection script in PDF-Extract-Kit supports various input formats such as ``a single image``, ``a directory of image files``, ``a single PDF file``, and ``a directory of PDF files``.

.. note:: 

   Modify the ``inputs`` path in ``configs/formula_detection.yaml`` according to your actual data format:
   - Single image: path/to/image  
   - Image directory: path/to/images  
   - Single PDF file: path/to/pdf  
   - PDF directory: path/to/pdfs  

.. note::

   When using a PDF as input, you need to change ``predict_images`` to ``predict_pdfs`` in ``formula_detection.py``.

   .. code:: python

      # for image detection
      detection_results = model_formula_detection.predict_images(input_data, result_path)
   
   Change to:

   .. code:: python

      # for pdf detection
      detection_results = model_formula_detection.predict_pdfs(input_data, result_path)


Viewing Visualization Results
--------------------

When the ``visualize`` option in the config file is set to ``True``, visualization results will be saved in the ``outputs/formula_detection`` directory.

.. note::

   Visualization facilitates the analysis of model results. However, for large-scale tasks, it is recommended to disable visualization (set ``visualize`` to ``False`` ) to reduce memory and disk usage.