MikeTrizna's picture
Upload folder using huggingface_hub
f3270e6 verified
doctr.io
========
.. currentmodule:: doctr.io
The io module enables users to easily access content from documents and export analysis
results to structured formats.
.. _document_structure:
Document structure
------------------
Structural organization of the documents.
Word
^^^^
A Word is an uninterrupted sequence of characters.
.. autoclass:: Word
Line
^^^^
A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
.. autoclass:: Line
Artefact
^^^^^^^^
An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
.. autoclass:: Artefact
Block
^^^^^
A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
.. autoclass:: Block
Page
^^^^
A Page is a collection of Blocks that were on the same physical page.
.. autoclass:: Page
.. automethod:: show
Document
^^^^^^^^
A Document is a collection of Pages.
.. autoclass:: Document
.. automethod:: show
File reading
------------
High-performance file reading and conversion to processable structured data.
.. autofunction:: read_pdf
.. autofunction:: read_img_as_numpy
.. autofunction:: read_img_as_tensor
.. autofunction:: decode_img_as_tensor
.. autofunction:: read_html
.. autoclass:: DocumentFile
.. automethod:: from_pdf
.. automethod:: from_url
.. automethod:: from_images