nemotron-ocr-v1 / quickstart.md
BoLiu's picture
Rename NemoRetrieverOCR class to NemotronOCR (#4)
df19ffc verified

Contributing

Installation

Install pixi for pulling conda/pip packages:

curl -fsSL https://pixi.sh/install.sh | sh

Create pixi environment and enter activated shell:

pixi s

Create a virtualenv and install nemotron-ocr into it via uv:

uv venv \
&& uv pip install -e ./nemotron-ocr -v

Assert that OCR inference libraries can now be imported successfully:

uv run python -c "import nemotron_ocr; import nemotron_ocr_cpp"

Usage

nemotron_ocr.inference.pipeline.NemotronOCR is the main entry point for performing OCR inference; it can be used to iterate over predictions for a given input image:

from nemotron_ocr.inference.pipeline import NemotronOCR

ocr = NemotronOCR()

predictions = ocr("ocr-example-input-1.png")

for pred in predictions:
    print(
        f"  - Text: '{pred['text']}', "
        f"Confidence: {pred['confidence']:.2f}, "
        f"Bbox: [left={pred['left']:.4f}, upper={pred['upper']:.4f}, right={pred['right']:.4f}, lower={pred['lower']:.4f}]"
    )

Or predictions can be superimposed on the input image for visualization:

ocr(image_path, visualize=True)

The level of detection merging can be adjusted by modifying the merge_level argument (defaulting to "paragraph"):

ocr(image_path, merge_level="word")      # leave detected words unmerged
ocr(image_path, merge_level="sentence")  # merge detected words into sentences

An example script example.py is provided for convenience:

uv run python example.py ocr-example-input-1.png

Detection merging can be adjusted by modifying the --merge-level option:

uv run python example.py ocr-example-input-1.png --merge-level word
uv run python example.py ocr-example-input-1.png --merge-level sentence