Spaces:
Running
Running
File size: 1,499 Bytes
f3270e6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
doctr.io
========
.. currentmodule:: doctr.io
The io module enables users to easily access content from documents and export analysis
results to structured formats.
.. _document_structure:
Document structure
------------------
Structural organization of the documents.
Word
^^^^
A Word is an uninterrupted sequence of characters.
.. autoclass:: Word
Line
^^^^
A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
.. autoclass:: Line
Artefact
^^^^^^^^
An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
.. autoclass:: Artefact
Block
^^^^^
A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
.. autoclass:: Block
Page
^^^^
A Page is a collection of Blocks that were on the same physical page.
.. autoclass:: Page
.. automethod:: show
Document
^^^^^^^^
A Document is a collection of Pages.
.. autoclass:: Document
.. automethod:: show
File reading
------------
High-performance file reading and conversion to processable structured data.
.. autofunction:: read_pdf
.. autofunction:: read_img_as_numpy
.. autofunction:: read_img_as_tensor
.. autofunction:: decode_img_as_tensor
.. autofunction:: read_html
.. autoclass:: DocumentFile
.. automethod:: from_pdf
.. automethod:: from_url
.. automethod:: from_images
|