Spaces:
Running
Running
| Metadata-Version: 2.4 | |
| Name: python-doctr | |
| Version: 1.0.1a0 | |
| Summary: Document Text Recognition (docTR): deep Learning for high-performance OCR on documents. | |
| Author-email: Mindee <contact@mindee.com> | |
| Maintainer: François-Guillaume Fernandez, Charles Gaillard, Olivier Dulcy, Felix Dittrich | |
| License: Apache License | |
| Version 2.0, January 2004 | |
| http://www.apache.org/licenses/ | |
| TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION | |
| 1. Definitions. | |
| "License" shall mean the terms and conditions for use, reproduction, | |
| and distribution as defined by Sections 1 through 9 of this document. | |
| "Licensor" shall mean the copyright owner or entity authorized by | |
| the copyright owner that is granting the License. | |
| "Legal Entity" shall mean the union of the acting entity and all | |
| other entities that control, are controlled by, or are under common | |
| control with that entity. For the purposes of this definition, | |
| "control" means (i) the power, direct or indirect, to cause the | |
| direction or management of such entity, whether by contract or | |
| otherwise, or (ii) ownership of fifty percent (50%) or more of the | |
| outstanding shares, or (iii) beneficial ownership of such entity. | |
| "You" (or "Your") shall mean an individual or Legal Entity | |
| exercising permissions granted by this License. | |
| "Source" form shall mean the preferred form for making modifications, | |
| including but not limited to software source code, documentation | |
| source, and configuration files. | |
| "Object" form shall mean any form resulting from mechanical | |
| transformation or translation of a Source form, including but | |
| not limited to compiled object code, generated documentation, | |
| and conversions to other media types. | |
| "Work" shall mean the work of authorship, whether in Source or | |
| Object form, made available under the License, as indicated by a | |
| copyright notice that is included in or attached to the work | |
| (an example is provided in the Appendix below). | |
| "Derivative Works" shall mean any work, whether in Source or Object | |
| form, that is based on (or derived from) the Work and for which the | |
| editorial revisions, annotations, elaborations, or other modifications | |
| represent, as a whole, an original work of authorship. For the purposes | |
| of this License, Derivative Works shall not include works that remain | |
| separable from, or merely link (or bind by name) to the interfaces of, | |
| the Work and Derivative Works thereof. | |
| "Contribution" shall mean any work of authorship, including | |
| the original version of the Work and any modifications or additions | |
| to that Work or Derivative Works thereof, that is intentionally | |
| submitted to Licensor for inclusion in the Work by the copyright owner | |
| or by an individual or Legal Entity authorized to submit on behalf of | |
| the copyright owner. For the purposes of this definition, "submitted" | |
| means any form of electronic, verbal, or written communication sent | |
| to the Licensor or its representatives, including but not limited to | |
| communication on electronic mailing lists, source code control systems, | |
| and issue tracking systems that are managed by, or on behalf of, the | |
| Licensor for the purpose of discussing and improving the Work, but | |
| excluding communication that is conspicuously marked or otherwise | |
| designated in writing by the copyright owner as "Not a Contribution." | |
| "Contributor" shall mean Licensor and any individual or Legal Entity | |
| on behalf of whom a Contribution has been received by Licensor and | |
| subsequently incorporated within the Work. | |
| 2. Grant of Copyright License. Subject to the terms and conditions of | |
| this License, each Contributor hereby grants to You a perpetual, | |
| worldwide, non-exclusive, no-charge, royalty-free, irrevocable | |
| copyright license to reproduce, prepare Derivative Works of, | |
| publicly display, publicly perform, sublicense, and distribute the | |
| Work and such Derivative Works in Source or Object form. | |
| 3. Grant of Patent License. Subject to the terms and conditions of | |
| this License, each Contributor hereby grants to You a perpetual, | |
| worldwide, non-exclusive, no-charge, royalty-free, irrevocable | |
| (except as stated in this section) patent license to make, have made, | |
| use, offer to sell, sell, import, and otherwise transfer the Work, | |
| where such license applies only to those patent claims licensable | |
| by such Contributor that are necessarily infringed by their | |
| Contribution(s) alone or by combination of their Contribution(s) | |
| with the Work to which such Contribution(s) was submitted. If You | |
| institute patent litigation against any entity (including a | |
| cross-claim or counterclaim in a lawsuit) alleging that the Work | |
| or a Contribution incorporated within the Work constitutes direct | |
| or contributory patent infringement, then any patent licenses | |
| granted to You under this License for that Work shall terminate | |
| as of the date such litigation is filed. | |
| 4. Redistribution. You may reproduce and distribute copies of the | |
| Work or Derivative Works thereof in any medium, with or without | |
| modifications, and in Source or Object form, provided that You | |
| meet the following conditions: | |
| (a) You must give any other recipients of the Work or | |
| Derivative Works a copy of this License; and | |
| (b) You must cause any modified files to carry prominent notices | |
| stating that You changed the files; and | |
| (c) You must retain, in the Source form of any Derivative Works | |
| that You distribute, all copyright, patent, trademark, and | |
| attribution notices from the Source form of the Work, | |
| excluding those notices that do not pertain to any part of | |
| the Derivative Works; and | |
| (d) If the Work includes a "NOTICE" text file as part of its | |
| distribution, then any Derivative Works that You distribute must | |
| include a readable copy of the attribution notices contained | |
| within such NOTICE file, excluding those notices that do not | |
| pertain to any part of the Derivative Works, in at least one | |
| of the following places: within a NOTICE text file distributed | |
| as part of the Derivative Works; within the Source form or | |
| documentation, if provided along with the Derivative Works; or, | |
| within a display generated by the Derivative Works, if and | |
| wherever such third-party notices normally appear. The contents | |
| of the NOTICE file are for informational purposes only and | |
| do not modify the License. You may add Your own attribution | |
| notices within Derivative Works that You distribute, alongside | |
| or as an addendum to the NOTICE text from the Work, provided | |
| that such additional attribution notices cannot be construed | |
| as modifying the License. | |
| You may add Your own copyright statement to Your modifications and | |
| may provide additional or different license terms and conditions | |
| for use, reproduction, or distribution of Your modifications, or | |
| for any such Derivative Works as a whole, provided Your use, | |
| reproduction, and distribution of the Work otherwise complies with | |
| the conditions stated in this License. | |
| 5. Submission of Contributions. Unless You explicitly state otherwise, | |
| any Contribution intentionally submitted for inclusion in the Work | |
| by You to the Licensor shall be under the terms and conditions of | |
| this License, without any additional terms or conditions. | |
| Notwithstanding the above, nothing herein shall supersede or modify | |
| the terms of any separate license agreement you may have executed | |
| with Licensor regarding such Contributions. | |
| 6. Trademarks. This License does not grant permission to use the trade | |
| names, trademarks, service marks, or product names of the Licensor, | |
| except as required for reasonable and customary use in describing the | |
| origin of the Work and reproducing the content of the NOTICE file. | |
| 7. Disclaimer of Warranty. Unless required by applicable law or | |
| agreed to in writing, Licensor provides the Work (and each | |
| Contributor provides its Contributions) on an "AS IS" BASIS, | |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or | |
| implied, including, without limitation, any warranties or conditions | |
| of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A | |
| PARTICULAR PURPOSE. You are solely responsible for determining the | |
| appropriateness of using or redistributing the Work and assume any | |
| risks associated with Your exercise of permissions under this License. | |
| 8. Limitation of Liability. In no event and under no legal theory, | |
| whether in tort (including negligence), contract, or otherwise, | |
| unless required by applicable law (such as deliberate and grossly | |
| negligent acts) or agreed to in writing, shall any Contributor be | |
| liable to You for damages, including any direct, indirect, special, | |
| incidental, or consequential damages of any character arising as a | |
| result of this License or out of the use or inability to use the | |
| Work (including but not limited to damages for loss of goodwill, | |
| work stoppage, computer failure or malfunction, or any and all | |
| other commercial damages or losses), even if such Contributor | |
| has been advised of the possibility of such damages. | |
| 9. Accepting Warranty or Additional Liability. While redistributing | |
| the Work or Derivative Works thereof, You may choose to offer, | |
| and charge a fee for, acceptance of support, warranty, indemnity, | |
| or other liability obligations and/or rights consistent with this | |
| License. However, in accepting such obligations, You may act only | |
| on Your own behalf and on Your sole responsibility, not on behalf | |
| of any other Contributor, and only if You agree to indemnify, | |
| defend, and hold each Contributor harmless for any liability | |
| incurred by, or claims asserted against, such Contributor by reason | |
| of your accepting any such warranty or additional liability. | |
| END OF TERMS AND CONDITIONS | |
| APPENDIX: How to apply the Apache License to your work. | |
| To apply the Apache License to your work, attach the following | |
| boilerplate notice, with the fields enclosed by brackets "[]" | |
| replaced with your own identifying information. (Don't include | |
| the brackets!) The text should be enclosed in the appropriate | |
| comment syntax for the file format. We also recommend that a | |
| file or class name and description of purpose be included on the | |
| same "printed page" as the copyright notice for easier | |
| identification within third-party archives. | |
| Copyright 2022 Mindee | |
| Licensed under the Apache License, Version 2.0 (the "License"); | |
| you may not use this file except in compliance with the License. | |
| You may obtain a copy of the License at | |
| http://www.apache.org/licenses/LICENSE-2.0 | |
| Unless required by applicable law or agreed to in writing, software | |
| distributed under the License is distributed on an "AS IS" BASIS, | |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |
| See the License for the specific language governing permissions and | |
| limitations under the License. | |
| Project-URL: documentation, https://mindee.github.io/doctr | |
| Project-URL: repository, https://github.com/mindee/doctr | |
| Project-URL: tracker, https://github.com/mindee/doctr/issues | |
| Project-URL: changelog, https://mindee.github.io/doctr/changelog.html | |
| Keywords: OCR,deep learning,computer vision,pytorch,text detection,text recognition | |
| Classifier: Development Status :: 4 - Beta | |
| Classifier: Intended Audience :: Developers | |
| Classifier: Intended Audience :: Education | |
| Classifier: Intended Audience :: Science/Research | |
| Classifier: License :: OSI Approved :: Apache Software License | |
| Classifier: Natural Language :: English | |
| Classifier: Operating System :: OS Independent | |
| Classifier: Programming Language :: Python :: 3 | |
| Classifier: Programming Language :: Python :: 3.10 | |
| Classifier: Programming Language :: Python :: 3.11 | |
| Classifier: Programming Language :: Python :: 3.12 | |
| Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence | |
| Requires-Python: <4,>=3.10.0 | |
| Description-Content-Type: text/markdown | |
| License-File: LICENSE | |
| Requires-Dist: torch<3.0.0,>=2.0.0 | |
| Requires-Dist: torchvision>=0.15.0 | |
| Requires-Dist: onnx<3.0.0,>=1.12.0 | |
| Requires-Dist: numpy<3.0.0,>=1.16.0 | |
| Requires-Dist: scipy<2.0.0,>=1.4.0 | |
| Requires-Dist: h5py<4.0.0,>=3.1.0 | |
| Requires-Dist: opencv-python<5.0.0,>=4.5.0 | |
| Requires-Dist: pypdfium2<5.0.0,>=4.11.0 | |
| Requires-Dist: pyclipper<2.0.0,>=1.2.0 | |
| Requires-Dist: shapely<3.0.0,>=1.6.0 | |
| Requires-Dist: langdetect<2.0.0,>=1.0.9 | |
| Requires-Dist: rapidfuzz<4.0.0,>=3.0.0 | |
| Requires-Dist: huggingface-hub<1.0.0,>=0.20.0 | |
| Requires-Dist: Pillow>=9.2.0 | |
| Requires-Dist: defusedxml>=0.7.0 | |
| Requires-Dist: anyascii>=0.3.2 | |
| Requires-Dist: validators>=0.18.0 | |
| Requires-Dist: tqdm>=4.30.0 | |
| Provides-Extra: html | |
| Requires-Dist: weasyprint>=55.0; extra == "html" | |
| Provides-Extra: viz | |
| Requires-Dist: matplotlib>=3.1.0; extra == "viz" | |
| Requires-Dist: mplcursors>=0.3; extra == "viz" | |
| Provides-Extra: contrib | |
| Requires-Dist: onnxruntime>=1.11.0; extra == "contrib" | |
| Provides-Extra: testing | |
| Requires-Dist: pytest>=5.3.2; extra == "testing" | |
| Requires-Dist: coverage[toml]>=4.5.4; extra == "testing" | |
| Requires-Dist: onnxruntime>=1.11.0; extra == "testing" | |
| Requires-Dist: requests>=2.20.0; extra == "testing" | |
| Requires-Dist: psutil>=5.9.5; extra == "testing" | |
| Provides-Extra: quality | |
| Requires-Dist: ruff>=0.1.5; extra == "quality" | |
| Requires-Dist: mypy>=0.812; extra == "quality" | |
| Requires-Dist: pre-commit>=2.17.0; extra == "quality" | |
| Provides-Extra: docs | |
| Requires-Dist: sphinx!=3.5.0,>=3.0.0; extra == "docs" | |
| Requires-Dist: sphinxemoji>=0.1.8; extra == "docs" | |
| Requires-Dist: sphinx-copybutton>=0.3.1; extra == "docs" | |
| Requires-Dist: docutils<0.23; extra == "docs" | |
| Requires-Dist: recommonmark>=0.7.1; extra == "docs" | |
| Requires-Dist: sphinx-markdown-tables>=0.0.15; extra == "docs" | |
| Requires-Dist: sphinx-tabs>=3.3.0; extra == "docs" | |
| Requires-Dist: furo>=2022.3.4; extra == "docs" | |
| Provides-Extra: dev | |
| Requires-Dist: torch<3.0.0,>=2.0.0; extra == "dev" | |
| Requires-Dist: torchvision>=0.15.0; extra == "dev" | |
| Requires-Dist: onnx<3.0.0,>=1.12.0; extra == "dev" | |
| Requires-Dist: weasyprint>=55.0; extra == "dev" | |
| Requires-Dist: matplotlib>=3.1.0; extra == "dev" | |
| Requires-Dist: mplcursors>=0.3; extra == "dev" | |
| Requires-Dist: pytest>=5.3.2; extra == "dev" | |
| Requires-Dist: coverage[toml]>=4.5.4; extra == "dev" | |
| Requires-Dist: onnxruntime>=1.11.0; extra == "dev" | |
| Requires-Dist: requests>=2.20.0; extra == "dev" | |
| Requires-Dist: psutil>=5.9.5; extra == "dev" | |
| Requires-Dist: ruff>=0.3.0; extra == "dev" | |
| Requires-Dist: mypy>=1.0; extra == "dev" | |
| Requires-Dist: pre-commit>=3.0.0; extra == "dev" | |
| Requires-Dist: sphinx!=3.5.0,>=3.0.0; extra == "dev" | |
| Requires-Dist: sphinxemoji>=0.1.8; extra == "dev" | |
| Requires-Dist: sphinx-copybutton>=0.3.1; extra == "dev" | |
| Requires-Dist: docutils<0.23; extra == "dev" | |
| Requires-Dist: recommonmark>=0.7.1; extra == "dev" | |
| Requires-Dist: sphinx-markdown-tables>=0.0.15; extra == "dev" | |
| Requires-Dist: sphinx-tabs>=3.3.0; extra == "dev" | |
| Requires-Dist: furo>=2022.3.4; extra == "dev" | |
| Dynamic: license-file | |
| <p align="center"> | |
| <img src="https://github.com/mindee/doctr/raw/main/docs/images/Logo_doctr.gif" width="40%"> | |
| </p> | |
| [](https://slack.mindee.com) [](LICENSE)  [](https://github.com/mindee/doctr/pkgs/container/doctr) [](https://codecov.io/gh/mindee/doctr) [](https://www.codefactor.io/repository/github/mindee/doctr) [](https://app.codacy.com/gh/mindee/doctr?utm_source=github.com&utm_medium=referral&utm_content=mindee/doctr&utm_campaign=Badge_Grade) [](https://mindee.github.io/doctr) [](https://pypi.org/project/python-doctr/) [](https://huggingface.co/spaces/mindee/doctr) [](https://colab.research.google.com/github/mindee/notebooks/blob/main/doctr/quicktour.ipynb) [](https://gurubase.io/g/doctr) | |
| **Optical Character Recognition made seamless & accessible to anyone, powered by PyTorch** | |
| What you can expect from this repository: | |
| - efficient ways to parse textual information (localize and identify each word) from your documents | |
| - guidance on how to integrate this in your current architecture | |
|  | |
| ## Quick Tour | |
| ### Getting your pretrained model | |
| End-to-End OCR is achieved in docTR using a two-stage approach: text detection (localizing words), then text recognition (identify all characters in the word). | |
| As such, you can select the architecture used for [text detection](https://mindee.github.io/doctr/latest/modules/models.html#doctr-models-detection), and the one for [text recognition](https://mindee.github.io/doctr/latest//modules/models.html#doctr-models-recognition) from the list of available implementations. | |
| ```python | |
| from doctr.models import ocr_predictor | |
| model = ocr_predictor(det_arch='db_resnet50', reco_arch='crnn_vgg16_bn', pretrained=True) | |
| ``` | |
| ### Reading files | |
| Documents can be interpreted from PDF or images: | |
| ```python | |
| from doctr.io import DocumentFile | |
| pdf_doc = DocumentFile.from_pdf("path/to/your/doc.pdf") | |
| # Image | |
| single_img_doc = DocumentFile.from_images("path/to/your/img.jpg") | |
| # Webpage (requires `weasyprint` to be installed) | |
| webpage_doc = DocumentFile.from_url("https://www.yoursite.com") | |
| # Multiple page images | |
| multi_img_doc = DocumentFile.from_images(["path/to/page1.jpg", "path/to/page2.jpg"]) | |
| ``` | |
| ### Putting it together | |
| Let's use the default pretrained model for an example: | |
| ```python | |
| from doctr.io import DocumentFile | |
| from doctr.models import ocr_predictor | |
| model = ocr_predictor(pretrained=True) | |
| doc = DocumentFile.from_pdf("path/to/your/doc.pdf") | |
| # Analyze | |
| result = model(doc) | |
| ``` | |
| ### Dealing with rotated documents | |
| Should you use docTR on documents that include rotated pages, or pages with multiple box orientations, | |
| you have multiple options to handle it: | |
| - If you only use straight document pages with straight words (horizontal, same reading direction), | |
| consider passing `assume_straight_pages=True` to the ocr_predictor. It will directly fit straight boxes | |
| on your page and return straight boxes, which makes it the fastest option. | |
| - If you want the predictor to output straight boxes (no matter the orientation of your pages, the final localizations | |
| will be converted to straight boxes), you need to pass `export_as_straight_boxes=True` in the predictor. Otherwise, if `assume_straight_pages=False`, it will return rotated bounding boxes (potentially with an angle of 0°). | |
| If both options are set to False, the predictor will always fit and return rotated boxes. | |
| To interpret your model's predictions, you can visualize them interactively as follows: | |
| ```python | |
| # Display the result (requires matplotlib & mplcursors to be installed) | |
| result.show() | |
| ``` | |
|  | |
| Or even rebuild the original document from its predictions: | |
| ```python | |
| import matplotlib.pyplot as plt | |
| synthetic_pages = result.synthesize() | |
| plt.imshow(synthetic_pages[0]); plt.axis('off'); plt.show() | |
| ``` | |
|  | |
| The `ocr_predictor` returns a `Document` object with a nested structure (with `Page`, `Block`, `Line`, `Word`, `Artefact`). | |
| To get a better understanding of our document model, check our [documentation](https://mindee.github.io/doctr/modules/io.html#document-structure): | |
| You can also export them as a nested dict, more appropriate for JSON format: | |
| ```python | |
| json_output = result.export() | |
| ``` | |
| ### Use the KIE predictor | |
| The KIE predictor is a more flexible predictor compared to OCR as your detection model can detect multiple classes in a document. For example, you can have a detection model to detect just dates and addresses in a document. | |
| The KIE predictor makes it possible to use detector with multiple classes with a recognition model and to have the whole pipeline already setup for you. | |
| ```python | |
| from doctr.io import DocumentFile | |
| from doctr.models import kie_predictor | |
| # Model | |
| model = kie_predictor(det_arch='db_resnet50', reco_arch='crnn_vgg16_bn', pretrained=True) | |
| doc = DocumentFile.from_pdf("path/to/your/doc.pdf") | |
| # Analyze | |
| result = model(doc) | |
| predictions = result.pages[0].predictions | |
| for class_name in predictions.keys(): | |
| list_predictions = predictions[class_name] | |
| for prediction in list_predictions: | |
| print(f"Prediction for {class_name}: {prediction}") | |
| ``` | |
| The KIE predictor results per page are in a dictionary format with each key representing a class name and it's value are the predictions for that class. | |
| ### If you are looking for support from the Mindee team | |
| [](https://mindee.com/product/doctr) | |
| ## Installation | |
| ### Prerequisites | |
| Python 3.10 (or higher) and [pip](https://pip.pypa.io/en/stable/) are required to install docTR. | |
| ### Latest release | |
| You can then install the latest release of the package using [pypi](https://pypi.org/project/python-doctr/) as follows: | |
| ```shell | |
| pip install python-doctr | |
| ``` | |
| We try to keep extra dependencies to a minimum. You can install specific builds as follows: | |
| ```shell | |
| # standard build | |
| pip install python-doctr | |
| # optional dependencies for visualization, html, and contrib modules can be installed as follows: | |
| pip install "python-doctr[viz,html,contrib]" | |
| ``` | |
| ### Developer mode | |
| Alternatively, you can install it from source, which will require you to install [Git](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git). | |
| First clone the project repository: | |
| ```shell | |
| git clone https://github.com/mindee/doctr.git | |
| pip install -e doctr/. | |
| ``` | |
| Again, if you prefer to avoid the risk of missing dependencies, you can install the build: | |
| ```shell | |
| pip install -e doctr/. | |
| ``` | |
| ## Models architectures | |
| Credits where it's due: this repository is implementing, among others, architectures from published research papers. | |
| ### Text Detection | |
| - DBNet: [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf). | |
| - LinkNet: [LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation](https://arxiv.org/pdf/1707.03718.pdf) | |
| - FAST: [FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation](https://arxiv.org/pdf/2111.02394.pdf) | |
| ### Text Recognition | |
| - CRNN: [An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition](https://arxiv.org/pdf/1507.05717.pdf). | |
| - SAR: [Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition](https://arxiv.org/pdf/1811.00751.pdf). | |
| - MASTER: [MASTER: Multi-Aspect Non-local Network for Scene Text Recognition](https://arxiv.org/pdf/1910.02562.pdf). | |
| - ViTSTR: [Vision Transformer for Fast and Efficient Scene Text Recognition](https://arxiv.org/pdf/2105.08582.pdf). | |
| - PARSeq: [Scene Text Recognition with Permuted Autoregressive Sequence Models](https://arxiv.org/pdf/2207.06966). | |
| - VIPTR: [A Vision Permutable Extractor for Fast and Efficient Scene Text Recognition](https://arxiv.org/abs/2401.10110). | |
| ## More goodies | |
| ### Documentation | |
| The full package documentation is available [here](https://mindee.github.io/doctr/) for detailed specifications. | |
| ### Demo app | |
| A minimal demo app is provided for you to play with our end-to-end OCR models! | |
|  | |
| #### Live demo | |
| Courtesy of :hugs: [Hugging Face](https://huggingface.co/) :hugs:, docTR has now a fully deployed version available on [Spaces](https://huggingface.co/spaces)! | |
| Check it out [](https://huggingface.co/spaces/mindee/doctr) | |
| #### Running it locally | |
| If you prefer to use it locally, there is an extra dependency ([Streamlit](https://streamlit.io/)) that is required. | |
| ```shell | |
| pip install -r demo/pt-requirements.txt | |
| ``` | |
| Then run your app in your default browser with: | |
| ```shell | |
| streamlit run demo/app.py | |
| ``` | |
| ### Docker container | |
| We offer Docker container support for easy testing and deployment. [Here are the available docker tags.](https://github.com/mindee/doctr/pkgs/container/doctr). | |
| #### Using GPU with docTR Docker Images | |
| The docTR Docker images are GPU-ready and based on CUDA `12.2`. Make sure your host is **at least `12.2`**, otherwise Torch won't be able to initialize the GPU. | |
| Please ensure that Docker is configured to use your GPU. | |
| To verify and configure GPU support for Docker, please follow the instructions provided in the [NVIDIA Container Toolkit Installation Guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html). | |
| Once Docker is configured to use GPUs, you can run docTR Docker containers with GPU support: | |
| ```shell | |
| docker run -it --gpus all ghcr.io/mindee/doctr:torch-py3.9.18-2024-10 bash | |
| ``` | |
| #### Available Tags | |
| The Docker images for docTR follow a specific tag nomenclature: `<deps>-py<python_version>-<doctr_version|YYYY-MM>`. Here's a breakdown of the tag structure: | |
| - `<deps>`: `torch`, `torch-viz-html-contrib`. | |
| - `<python_version>`: `3.9.18`, `3.10.13` or `3.11.8`. | |
| - `<doctr_version>`: a tag >= `v0.11.0` | |
| - `<YYYY-MM>`: e.g. `2014-10` | |
| Here are examples of different image tags: | |
| | Tag | Description | | |
| |----------------------------|---------------------------------------------------| | |
| | `torch-viz-html-contrib-py3.11.8-2024-10` | Torch with extra dependencies version `3.11.8` from latest commit on `main` in `2024-10`. | | |
| | `torch-py3.11.8-2024-10`| PyTorch version `3.11.8` from latest commit on `main` in `2024-10`. | | |
| #### Building Docker Images Locally | |
| You can also build docTR Docker images locally on your computer. | |
| ```shell | |
| docker build -t doctr . | |
| ``` | |
| You can specify custom Python versions and docTR versions using build arguments. For example, to build a docTR image with PyTorch, Python version `3.9.10`, and docTR version `v0.7.0`, run the following command: | |
| ```shell | |
| docker build -t doctr --build-arg FRAMEWORK=torch --build-arg PYTHON_VERSION=3.9.10 --build-arg DOCTR_VERSION=v0.7.0 . | |
| ``` | |
| ### Example script | |
| An example script is provided for a simple documentation analysis of a PDF or image file: | |
| ```shell | |
| python scripts/analyze.py path/to/your/doc.pdf | |
| ``` | |
| All script arguments can be checked using `python scripts/analyze.py --help` | |
| ### Minimal API integration | |
| Looking to integrate docTR into your API? Here is a template to get you started with a fully working API using the wonderful [FastAPI](https://github.com/tiangolo/fastapi) framework. | |
| #### Deploy your API locally | |
| Specific dependencies are required to run the API template, which you can install as follows: | |
| ```shell | |
| cd api/ | |
| pip install poetry | |
| make lock | |
| pip install -r requirements.txt | |
| ``` | |
| You can now run your API locally: | |
| ```shell | |
| uvicorn --reload --workers 1 --host 0.0.0.0 --port=8002 --app-dir api/ app.main:app | |
| ``` | |
| Alternatively, you can run the same server on a docker container if you prefer using: | |
| ```shell | |
| PORT=8002 docker-compose up -d --build | |
| ``` | |
| #### What you have deployed | |
| Your API should now be running locally on your port 8002. Access your automatically-built documentation at [http://localhost:8002/redoc](http://localhost:8002/redoc) and enjoy your three functional routes ("/detection", "/recognition", "/ocr", "/kie"). Here is an example with Python to send a request to the OCR route: | |
| ```python | |
| import requests | |
| params = {"det_arch": "db_resnet50", "reco_arch": "crnn_vgg16_bn"} | |
| with open('/path/to/your/doc.jpg', 'rb') as f: | |
| files = [ # application/pdf, image/jpeg, image/png supported | |
| ("files", ("doc.jpg", f.read(), "image/jpeg")), | |
| ] | |
| print(requests.post("http://localhost:8080/ocr", params=params, files=files).json()) | |
| ``` | |
| ### Example notebooks | |
| Looking for more illustrations of docTR features? You might want to check the [Jupyter notebooks](https://github.com/mindee/doctr/tree/main/notebooks) designed to give you a broader overview. | |
| ## Citation | |
| If you wish to cite this project, feel free to use this [BibTeX](http://www.bibtex.org/) reference: | |
| ```bibtex | |
| @misc{doctr2021, | |
| title={docTR: Document Text Recognition}, | |
| author={Mindee}, | |
| year={2021}, | |
| publisher = {GitHub}, | |
| howpublished = {\url{https://github.com/mindee/doctr}} | |
| } | |
| ``` | |
| ## Contributing | |
| If you scrolled down to this section, you most likely appreciate open source. Do you feel like extending the range of our supported characters? Or perhaps submitting a paper implementation? Or contributing in any other way? | |
| You're in luck, we compiled a short guide (cf. [`CONTRIBUTING`](https://mindee.github.io/doctr/contributing/contributing.html)) for you to easily do so! | |
| ## License | |
| Distributed under the Apache 2.0 License. See [`LICENSE`](https://github.com/mindee/doctr?tab=Apache-2.0-1-ov-file#readme) for more information. | |