|
.. _algorithm_formula_detection: |
|
|
|
==================== |
|
Formula Detection Algorithm |
|
==================== |
|
|
|
Introduction |
|
==================== |
|
|
|
Formula detection involves identifying the positions of all formulas (including inline and block formulas) in a given input image. |
|
|
|
.. note:: |
|
|
|
Formula detection is technically a subtask of layout detection. However, due to its complexity, we recommend using a dedicated formula detection model to decouple it. This approach typically makes data annotation easier and improves detection performance. |
|
|
|
Model Usage |
|
==================== |
|
|
|
With the environment properly set up, simply run the layout detection algorithm script by executing ``scripts/formula_detection.py``. |
|
|
|
.. code:: shell |
|
|
|
$ python scripts/formula_detection.py --config configs/formula_detection.yaml |
|
|
|
Model Configuration |
|
-------------------- |
|
|
|
.. code:: yaml |
|
|
|
inputs: assets/demo/formula_detection |
|
outputs: outputs/formula_detection |
|
tasks: |
|
formula_detection: |
|
model: formula_detection_yolo |
|
model_config: |
|
img_size: 1280 |
|
conf_thres: 0.25 |
|
iou_thres: 0.45 |
|
batch_size: 1 |
|
model_path: models/MFD/yolov8/weights.pt |
|
visualize: True |
|
|
|
- inputs/outputs: Define the input file path and the visualization output directory, respectively. |
|
- tasks: Define the task type, currently only a formula detection task is included. |
|
- model: Define the specific model type: currently, only the YOLO formula detection model is available. |
|
- model_config: Define the model configuration. |
|
- img_size: Define the image's longer side size; the shorter side will be scaled proportionally. |
|
- conf_thres: Define the confidence threshold; only targets above this threshold will be detected. |
|
- iou_thres: Define the IoU threshold to remove targets with an overlap greater than this value. |
|
- batch_size: Define the batch size; the number of images inferred simultaneously. Generally, the larger the batch size, the faster the inference speed. A better GPU allows for a larger batch size. |
|
- model_path: Path to the model weights. |
|
- visualize: Whether to visualize the model results. Visualized results will be saved in the outputs directory. |
|
|
|
Diverse Input Support |
|
-------------------- |
|
|
|
The formula detection script in PDF-Extract-Kit supports various input formats such as ``a single image``, ``a directory of image files``, ``a single PDF file``, and ``a directory of PDF files``. |
|
|
|
.. note:: |
|
|
|
Modify the ``inputs`` path in ``configs/formula_detection.yaml`` according to your actual data format: |
|
- Single image: path/to/image |
|
- Image directory: path/to/images |
|
- Single PDF file: path/to/pdf |
|
- PDF directory: path/to/pdfs |
|
|
|
.. note:: |
|
|
|
When using a PDF as input, you need to change ``predict_images`` to ``predict_pdfs`` in ``formula_detection.py``. |
|
|
|
.. code:: python |
|
|
|
# for image detection |
|
detection_results = model_formula_detection.predict_images(input_data, result_path) |
|
|
|
Change to: |
|
|
|
.. code:: python |
|
|
|
# for pdf detection |
|
detection_results = model_formula_detection.predict_pdfs(input_data, result_path) |
|
|
|
|
|
Viewing Visualization Results |
|
-------------------- |
|
|
|
When the ``visualize`` option in the config file is set to ``True``, visualization results will be saved in the ``outputs/formula_detection`` directory. |
|
|
|
.. note:: |
|
|
|
Visualization facilitates the analysis of model results. However, for large-scale tasks, it is recommended to disable visualization (set ``visualize`` to ``False`` ) to reduce memory and disk usage. |