legacies commited on
Commit
dd7d028
·
1 Parent(s): 0e17e4e

initial files

Browse files
Files changed (2) hide show
  1. README.md +10 -384
  2. README1.md +384 -0
README.md CHANGED
@@ -1,384 +1,10 @@
1
- <p align="center">
2
- <img src="https://github.com/mindee/doctr/raw/main/docs/images/Logo_doctr.gif" width="40%">
3
- </p>
4
-
5
- [![Slack Icon](https://img.shields.io/badge/Slack-Community-4A154B?style=flat-square&logo=slack&logoColor=white)](https://slack.mindee.com) [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE) ![Build Status](https://github.com/mindee/doctr/workflows/builds/badge.svg) [![Docker Images](https://img.shields.io/badge/Docker-4287f5?style=flat&logo=docker&logoColor=white)](https://github.com/mindee/doctr/pkgs/container/doctr) [![codecov](https://codecov.io/gh/mindee/doctr/branch/main/graph/badge.svg?token=577MO567NM)](https://codecov.io/gh/mindee/doctr) [![CodeFactor](https://www.codefactor.io/repository/github/mindee/doctr/badge?s=bae07db86bb079ce9d6542315b8c6e70fa708a7e)](https://www.codefactor.io/repository/github/mindee/doctr) [![Codacy Badge](https://api.codacy.com/project/badge/Grade/340a76749b634586a498e1c0ab998f08)](https://app.codacy.com/gh/mindee/doctr?utm_source=github.com&utm_medium=referral&utm_content=mindee/doctr&utm_campaign=Badge_Grade) [![Doc Status](https://github.com/mindee/doctr/workflows/doc-status/badge.svg)](https://mindee.github.io/doctr) [![Pypi](https://img.shields.io/badge/pypi-v0.8.1-blue.svg)](https://pypi.org/project/python-doctr/) [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/mindee/doctr) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mindee/notebooks/blob/main/doctr/quicktour.ipynb)
6
-
7
-
8
- **Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch**
9
-
10
- What you can expect from this repository:
11
-
12
- - efficient ways to parse textual information (localize and identify each word) from your documents
13
- - guidance on how to integrate this in your current architecture
14
-
15
- ![OCR_example](https://github.com/mindee/doctr/raw/main/docs/images/ocr.png)
16
-
17
- ## Quick Tour
18
-
19
- ### Getting your pretrained model
20
-
21
- End-to-End OCR is achieved in docTR using a two-stage approach: text detection (localizing words), then text recognition (identify all characters in the word).
22
- As such, you can select the architecture used for [text detection](https://mindee.github.io/doctr/latest/modules/models.html#doctr-models-detection), and the one for [text recognition](https://mindee.github.io/doctr/latest//modules/models.html#doctr-models-recognition) from the list of available implementations.
23
-
24
- ```python
25
- from doctr.models import ocr_predictor
26
-
27
- model = ocr_predictor(det_arch='db_resnet50', reco_arch='crnn_vgg16_bn', pretrained=True)
28
- ```
29
-
30
- ### Reading files
31
-
32
- Documents can be interpreted from PDF or images:
33
-
34
- ```python
35
- from doctr.io import DocumentFile
36
- # PDF
37
- pdf_doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
38
- # Image
39
- single_img_doc = DocumentFile.from_images("path/to/your/img.jpg")
40
- # Webpage
41
- webpage_doc = DocumentFile.from_url("https://www.yoursite.com")
42
- # Multiple page images
43
- multi_img_doc = DocumentFile.from_images(["path/to/page1.jpg", "path/to/page2.jpg"])
44
- ```
45
-
46
- ### Putting it together
47
-
48
- Let's use the default pretrained model for an example:
49
-
50
- ```python
51
- from doctr.io import DocumentFile
52
- from doctr.models import ocr_predictor
53
-
54
- model = ocr_predictor(pretrained=True)
55
- # PDF
56
- doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
57
- # Analyze
58
- result = model(doc)
59
- ```
60
-
61
- ### Dealing with rotated documents
62
-
63
- Should you use docTR on documents that include rotated pages, or pages with multiple box orientations,
64
- you have multiple options to handle it:
65
-
66
- - If you only use straight document pages with straight words (horizontal, same reading direction),
67
- consider passing `assume_straight_boxes=True` to the ocr_predictor. It will directly fit straight boxes
68
- on your page and return straight boxes, which makes it the fastest option.
69
-
70
- - If you want the predictor to output straight boxes (no matter the orientation of your pages, the final localizations
71
- will be converted to straight boxes), you need to pass `export_as_straight_boxes=True` in the predictor. Otherwise, if `assume_straight_pages=False`, it will return rotated bounding boxes (potentially with an angle of 0°).
72
-
73
- If both options are set to False, the predictor will always fit and return rotated boxes.
74
-
75
- To interpret your model's predictions, you can visualize them interactively as follows:
76
-
77
- ```python
78
- result.show()
79
- ```
80
-
81
- ![Visualization sample](https://github.com/mindee/doctr/raw/main/docs/images/doctr_example_script.gif)
82
-
83
- Or even rebuild the original document from its predictions:
84
-
85
- ```python
86
- import matplotlib.pyplot as plt
87
-
88
- synthetic_pages = result.synthesize()
89
- plt.imshow(synthetic_pages[0]); plt.axis('off'); plt.show()
90
- ```
91
-
92
- ![Synthesis sample](https://github.com/mindee/doctr/raw/main/docs/images/synthesized_sample.png)
93
-
94
- The `ocr_predictor` returns a `Document` object with a nested structure (with `Page`, `Block`, `Line`, `Word`, `Artefact`).
95
- To get a better understanding of our document model, check our [documentation](https://mindee.github.io/doctr/modules/io.html#document-structure):
96
-
97
- You can also export them as a nested dict, more appropriate for JSON format:
98
-
99
- ```python
100
- json_output = result.export()
101
- ```
102
-
103
- ### Use the KIE predictor
104
-
105
- The KIE predictor is a more flexible predictor compared to OCR as your detection model can detect multiple classes in a document. For example, you can have a detection model to detect just dates and addresses in a document.
106
-
107
- The KIE predictor makes it possible to use detector with multiple classes with a recognition model and to have the whole pipeline already setup for you.
108
-
109
- ```python
110
- from doctr.io import DocumentFile
111
- from doctr.models import kie_predictor
112
-
113
- # Model
114
- model = kie_predictor(det_arch='db_resnet50', reco_arch='crnn_vgg16_bn', pretrained=True)
115
- # PDF
116
- doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
117
- # Analyze
118
- result = model(doc)
119
-
120
- predictions = result.pages[0].predictions
121
- for class_name in predictions.keys():
122
- list_predictions = predictions[class_name]
123
- for prediction in list_predictions:
124
- print(f"Prediction for {class_name}: {prediction}")
125
- ```
126
-
127
- The KIE predictor results per page are in a dictionary format with each key representing a class name and it's value are the predictions for that class.
128
-
129
- ### If you are looking for support from the Mindee team
130
-
131
- [![Bad OCR test detection image asking the developer if they need help](https://github.com/mindee/doctr/raw/main/docs/images/doctr-need-help.png)](https://mindee.com/product/doctr)
132
-
133
- ## Installation
134
-
135
- ### Prerequisites
136
-
137
- Python 3.9 (or higher) and [pip](https://pip.pypa.io/en/stable/) are required to install docTR.
138
-
139
- Since we use [weasyprint](https://weasyprint.org/), you will need extra dependencies if you are not running Linux.
140
-
141
- For MacOS users, you can install them as follows:
142
-
143
- ```shell
144
- brew install cairo pango gdk-pixbuf libffi
145
- ```
146
-
147
- For Windows users, those dependencies are included in GTK. You can find the latest installer over [here](https://github.com/tschoonj/GTK-for-Windows-Runtime-Environment-Installer/releases).
148
-
149
- ### Latest release
150
-
151
- You can then install the latest release of the package using [pypi](https://pypi.org/project/python-doctr/) as follows:
152
-
153
- ```shell
154
- pip install python-doctr
155
- ```
156
-
157
- > :warning: Please note that the basic installation is not standalone, as it does not provide a deep learning framework, which is required for the package to run.
158
-
159
- We try to keep framework-specific dependencies to a minimum. You can install framework-specific builds as follows:
160
-
161
- ```shell
162
- # for TensorFlow
163
- pip install "python-doctr[tf]"
164
- # for PyTorch
165
- pip install "python-doctr[torch]"
166
- ```
167
-
168
- For MacBooks with M1 chip, you will need some additional packages or specific versions:
169
-
170
- - TensorFlow 2: [metal plugin](https://developer.apple.com/metal/tensorflow-plugin/)
171
- - PyTorch: [version >= 1.12.0](https://pytorch.org/get-started/locally/#start-locally)
172
-
173
- ### Developer mode
174
-
175
- Alternatively, you can install it from source, which will require you to install [Git](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git).
176
- First clone the project repository:
177
-
178
- ```shell
179
- git clone https://github.com/mindee/doctr.git
180
- pip install -e doctr/.
181
- ```
182
-
183
- Again, if you prefer to avoid the risk of missing dependencies, you can install the TensorFlow or the PyTorch build:
184
-
185
- ```shell
186
- # for TensorFlow
187
- pip install -e doctr/.[tf]
188
- # for PyTorch
189
- pip install -e doctr/.[torch]
190
- ```
191
-
192
- ## Models architectures
193
-
194
- Credits where it's due: this repository is implementing, among others, architectures from published research papers.
195
-
196
- ### Text Detection
197
-
198
- - DBNet: [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf).
199
- - LinkNet: [LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation](https://arxiv.org/pdf/1707.03718.pdf)
200
- - FAST: [FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation](https://arxiv.org/pdf/2111.02394.pdf)
201
-
202
- ### Text Recognition
203
-
204
- - CRNN: [An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition](https://arxiv.org/pdf/1507.05717.pdf).
205
- - SAR: [Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition](https://arxiv.org/pdf/1811.00751.pdf).
206
- - MASTER: [MASTER: Multi-Aspect Non-local Network for Scene Text Recognition](https://arxiv.org/pdf/1910.02562.pdf).
207
- - ViTSTR: [Vision Transformer for Fast and Efficient Scene Text Recognition](https://arxiv.org/pdf/2105.08582.pdf).
208
- - PARSeq: [Scene Text Recognition with Permuted Autoregressive Sequence Models](https://arxiv.org/pdf/2207.06966).
209
-
210
- ## More goodies
211
-
212
- ### Documentation
213
-
214
- The full package documentation is available [here](https://mindee.github.io/doctr/) for detailed specifications.
215
-
216
- ### Demo app
217
-
218
- A minimal demo app is provided for you to play with our end-to-end OCR models!
219
-
220
- ![Demo app](https://github.com/mindee/doctr/raw/main/docs/images/demo_update.png)
221
-
222
- #### Live demo
223
-
224
- Courtesy of :hugs: [Hugging Face](https://huggingface.co/) :hugs:, docTR has now a fully deployed version available on [Spaces](https://huggingface.co/spaces)!
225
- Check it out [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/mindee/doctr)
226
-
227
- #### Running it locally
228
-
229
- If you prefer to use it locally, there is an extra dependency ([Streamlit](https://streamlit.io/)) that is required.
230
-
231
- ##### Tensorflow version
232
-
233
- ```shell
234
- pip install -r demo/tf-requirements.txt
235
- ```
236
-
237
- Then run your app in your default browser with:
238
-
239
- ```shell
240
- USE_TF=1 streamlit run demo/app.py
241
- ```
242
-
243
- ##### PyTorch version
244
-
245
- ```shell
246
- pip install -r demo/pt-requirements.txt
247
- ```
248
-
249
- Then run your app in your default browser with:
250
-
251
- ```shell
252
- USE_TORCH=1 streamlit run demo/app.py
253
- ```
254
-
255
- #### TensorFlow.js
256
-
257
- Instead of having your demo actually running Python, you would prefer to run everything in your web browser?
258
- Check out our [TensorFlow.js demo](https://github.com/mindee/doctr-tfjs-demo) to get started!
259
-
260
- ![TFJS demo](https://github.com/mindee/doctr/raw/main/docs/images/demo_illustration_mini.png)
261
-
262
- ### Docker container
263
-
264
- [We offer Docker container support for easy testing and deployment](https://github.com/mindee/doctr/pkgs/container/doctr).
265
-
266
- #### Using GPU with docTR Docker Images
267
-
268
- The docTR Docker images are GPU-ready and based on CUDA `11.8`.
269
- However, to use GPU support with these Docker images, please ensure that Docker is configured to use your GPU.
270
-
271
- To verify and configure GPU support for Docker, please follow the instructions provided in the [NVIDIA Container Toolkit Installation Guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html).
272
-
273
- Once Docker is configured to use GPUs, you can run docTR Docker containers with GPU support:
274
-
275
- ```shell
276
- docker run -it --gpus all ghcr.io/mindee/doctr:tf-py3.8.18-gpu-2023-09 bash
277
- ```
278
-
279
- #### Available Tags
280
-
281
- The Docker images for docTR follow a specific tag nomenclature: `<framework>-py<python_version>-<system>-<doctr_version|YYYY-MM>`. Here's a breakdown of the tag structure:
282
-
283
- - `<framework>`: `tf` (TensorFlow) or `torch` (PyTorch).
284
- - `<python_version>`: `3.8.18`, `3.9.18`, or `3.10.13`.
285
- - `<system>`: `cpu` or `gpu`
286
- - `<doctr_version>`: a tag >= `v0.7.1`
287
- - `<YYYY-MM>`: e.g. `2023-09`
288
-
289
- Here are examples of different image tags:
290
-
291
- | Tag | Description |
292
- |----------------------------|---------------------------------------------------|
293
- | `tf-py3.8.18-cpu-v0.7.1` | TensorFlow version `3.8.18` with docTR `v0.7.1`. |
294
- | `torch-py3.9.18-gpu-2023-09`| PyTorch version `3.9.18` with GPU support and a monthly build from `2023-09`. |
295
-
296
- #### Building Docker Images Locally
297
-
298
- You can also build docTR Docker images locally on your computer.
299
-
300
- ```shell
301
- docker build -t doctr .
302
- ```
303
-
304
- You can specify custom Python versions and docTR versions using build arguments. For example, to build a docTR image with TensorFlow, Python version `3.9.10`, and docTR version `v0.7.0`, run the following command:
305
-
306
- ```shell
307
- docker build -t doctr --build-arg FRAMEWORK=tf --build-arg PYTHON_VERSION=3.9.10 --build-arg DOCTR_VERSION=v0.7.0 .
308
- ```
309
-
310
- ### Example script
311
-
312
- An example script is provided for a simple documentation analysis of a PDF or image file:
313
-
314
- ```shell
315
- python scripts/analyze.py path/to/your/doc.pdf
316
- ```
317
-
318
- All script arguments can be checked using `python scripts/analyze.py --help`
319
-
320
- ### Minimal API integration
321
-
322
- Looking to integrate docTR into your API? Here is a template to get you started with a fully working API using the wonderful [FastAPI](https://github.com/tiangolo/fastapi) framework.
323
-
324
- #### Deploy your API locally
325
-
326
- Specific dependencies are required to run the API template, which you can install as follows:
327
-
328
- ```shell
329
- cd api/
330
- pip install poetry
331
- make lock
332
- pip install -r requirements.txt
333
- ```
334
-
335
- You can now run your API locally:
336
-
337
- ```shell
338
- uvicorn --reload --workers 1 --host 0.0.0.0 --port=8002 --app-dir api/ app.main:app
339
- ```
340
-
341
- Alternatively, you can run the same server on a docker container if you prefer using:
342
-
343
- ```shell
344
- PORT=8002 docker-compose up -d --build
345
- ```
346
-
347
- #### What you have deployed
348
-
349
- Your API should now be running locally on your port 8002. Access your automatically-built documentation at [http://localhost:8002/redoc](http://localhost:8002/redoc) and enjoy your three functional routes ("/detection", "/recognition", "/ocr", "/kie"). Here is an example with Python to send a request to the OCR route:
350
-
351
- ```python
352
- import requests
353
- with open('/path/to/your/doc.jpg', 'rb') as f:
354
- data = f.read()
355
- response = requests.post("http://localhost:8002/ocr", files={'file': data}).json()
356
- ```
357
-
358
- ### Example notebooks
359
-
360
- Looking for more illustrations of docTR features? You might want to check the [Jupyter notebooks](https://github.com/mindee/doctr/tree/main/notebooks) designed to give you a broader overview.
361
-
362
- ## Citation
363
-
364
- If you wish to cite this project, feel free to use this [BibTeX](http://www.bibtex.org/) reference:
365
-
366
- ```bibtex
367
- @misc{doctr2021,
368
- title={docTR: Document Text Recognition},
369
- author={Mindee},
370
- year={2021},
371
- publisher = {GitHub},
372
- howpublished = {\url{https://github.com/mindee/doctr}}
373
- }
374
- ```
375
-
376
- ## Contributing
377
-
378
- If you scrolled down to this section, you most likely appreciate open source. Do you feel like extending the range of our supported characters? Or perhaps submitting a paper implementation? Or contributing in any other way?
379
-
380
- You're in luck, we compiled a short guide (cf. [`CONTRIBUTING`](https://mindee.github.io/doctr/contributing/contributing.html)) for you to easily do so!
381
-
382
- ## License
383
-
384
- Distributed under the Apache 2.0 License. See [`LICENSE`](https://github.com/mindee/doctr?tab=Apache-2.0-1-ov-file#readme) for more information.
 
1
+ ---
2
+ title: doctr
3
+ emoji: {{emoji}}
4
+ colorFrom: {{colorFrom}}
5
+ colorTo: {{colorTo}}
6
+ sdk: {{sdk}}
7
+ sdk_version: "{{sdkVersion}}"
8
+ app_file: app.py
9
+ pinned: false
10
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
README1.md ADDED
@@ -0,0 +1,384 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <p align="center">
2
+ <img src="https://github.com/mindee/doctr/raw/main/docs/images/Logo_doctr.gif" width="40%">
3
+ </p>
4
+
5
+ [![Slack Icon](https://img.shields.io/badge/Slack-Community-4A154B?style=flat-square&logo=slack&logoColor=white)](https://slack.mindee.com) [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE) ![Build Status](https://github.com/mindee/doctr/workflows/builds/badge.svg) [![Docker Images](https://img.shields.io/badge/Docker-4287f5?style=flat&logo=docker&logoColor=white)](https://github.com/mindee/doctr/pkgs/container/doctr) [![codecov](https://codecov.io/gh/mindee/doctr/branch/main/graph/badge.svg?token=577MO567NM)](https://codecov.io/gh/mindee/doctr) [![CodeFactor](https://www.codefactor.io/repository/github/mindee/doctr/badge?s=bae07db86bb079ce9d6542315b8c6e70fa708a7e)](https://www.codefactor.io/repository/github/mindee/doctr) [![Codacy Badge](https://api.codacy.com/project/badge/Grade/340a76749b634586a498e1c0ab998f08)](https://app.codacy.com/gh/mindee/doctr?utm_source=github.com&utm_medium=referral&utm_content=mindee/doctr&utm_campaign=Badge_Grade) [![Doc Status](https://github.com/mindee/doctr/workflows/doc-status/badge.svg)](https://mindee.github.io/doctr) [![Pypi](https://img.shields.io/badge/pypi-v0.8.1-blue.svg)](https://pypi.org/project/python-doctr/) [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/mindee/doctr) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mindee/notebooks/blob/main/doctr/quicktour.ipynb)
6
+
7
+
8
+ **Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch**
9
+
10
+ What you can expect from this repository:
11
+
12
+ - efficient ways to parse textual information (localize and identify each word) from your documents
13
+ - guidance on how to integrate this in your current architecture
14
+
15
+ ![OCR_example](https://github.com/mindee/doctr/raw/main/docs/images/ocr.png)
16
+
17
+ ## Quick Tour
18
+
19
+ ### Getting your pretrained model
20
+
21
+ End-to-End OCR is achieved in docTR using a two-stage approach: text detection (localizing words), then text recognition (identify all characters in the word).
22
+ As such, you can select the architecture used for [text detection](https://mindee.github.io/doctr/latest/modules/models.html#doctr-models-detection), and the one for [text recognition](https://mindee.github.io/doctr/latest//modules/models.html#doctr-models-recognition) from the list of available implementations.
23
+
24
+ ```python
25
+ from doctr.models import ocr_predictor
26
+
27
+ model = ocr_predictor(det_arch='db_resnet50', reco_arch='crnn_vgg16_bn', pretrained=True)
28
+ ```
29
+
30
+ ### Reading files
31
+
32
+ Documents can be interpreted from PDF or images:
33
+
34
+ ```python
35
+ from doctr.io import DocumentFile
36
+ # PDF
37
+ pdf_doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
38
+ # Image
39
+ single_img_doc = DocumentFile.from_images("path/to/your/img.jpg")
40
+ # Webpage
41
+ webpage_doc = DocumentFile.from_url("https://www.yoursite.com")
42
+ # Multiple page images
43
+ multi_img_doc = DocumentFile.from_images(["path/to/page1.jpg", "path/to/page2.jpg"])
44
+ ```
45
+
46
+ ### Putting it together
47
+
48
+ Let's use the default pretrained model for an example:
49
+
50
+ ```python
51
+ from doctr.io import DocumentFile
52
+ from doctr.models import ocr_predictor
53
+
54
+ model = ocr_predictor(pretrained=True)
55
+ # PDF
56
+ doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
57
+ # Analyze
58
+ result = model(doc)
59
+ ```
60
+
61
+ ### Dealing with rotated documents
62
+
63
+ Should you use docTR on documents that include rotated pages, or pages with multiple box orientations,
64
+ you have multiple options to handle it:
65
+
66
+ - If you only use straight document pages with straight words (horizontal, same reading direction),
67
+ consider passing `assume_straight_boxes=True` to the ocr_predictor. It will directly fit straight boxes
68
+ on your page and return straight boxes, which makes it the fastest option.
69
+
70
+ - If you want the predictor to output straight boxes (no matter the orientation of your pages, the final localizations
71
+ will be converted to straight boxes), you need to pass `export_as_straight_boxes=True` in the predictor. Otherwise, if `assume_straight_pages=False`, it will return rotated bounding boxes (potentially with an angle of 0°).
72
+
73
+ If both options are set to False, the predictor will always fit and return rotated boxes.
74
+
75
+ To interpret your model's predictions, you can visualize them interactively as follows:
76
+
77
+ ```python
78
+ result.show()
79
+ ```
80
+
81
+ ![Visualization sample](https://github.com/mindee/doctr/raw/main/docs/images/doctr_example_script.gif)
82
+
83
+ Or even rebuild the original document from its predictions:
84
+
85
+ ```python
86
+ import matplotlib.pyplot as plt
87
+
88
+ synthetic_pages = result.synthesize()
89
+ plt.imshow(synthetic_pages[0]); plt.axis('off'); plt.show()
90
+ ```
91
+
92
+ ![Synthesis sample](https://github.com/mindee/doctr/raw/main/docs/images/synthesized_sample.png)
93
+
94
+ The `ocr_predictor` returns a `Document` object with a nested structure (with `Page`, `Block`, `Line`, `Word`, `Artefact`).
95
+ To get a better understanding of our document model, check our [documentation](https://mindee.github.io/doctr/modules/io.html#document-structure):
96
+
97
+ You can also export them as a nested dict, more appropriate for JSON format:
98
+
99
+ ```python
100
+ json_output = result.export()
101
+ ```
102
+
103
+ ### Use the KIE predictor
104
+
105
+ The KIE predictor is a more flexible predictor compared to OCR as your detection model can detect multiple classes in a document. For example, you can have a detection model to detect just dates and addresses in a document.
106
+
107
+ The KIE predictor makes it possible to use detector with multiple classes with a recognition model and to have the whole pipeline already setup for you.
108
+
109
+ ```python
110
+ from doctr.io import DocumentFile
111
+ from doctr.models import kie_predictor
112
+
113
+ # Model
114
+ model = kie_predictor(det_arch='db_resnet50', reco_arch='crnn_vgg16_bn', pretrained=True)
115
+ # PDF
116
+ doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
117
+ # Analyze
118
+ result = model(doc)
119
+
120
+ predictions = result.pages[0].predictions
121
+ for class_name in predictions.keys():
122
+ list_predictions = predictions[class_name]
123
+ for prediction in list_predictions:
124
+ print(f"Prediction for {class_name}: {prediction}")
125
+ ```
126
+
127
+ The KIE predictor results per page are in a dictionary format with each key representing a class name and it's value are the predictions for that class.
128
+
129
+ ### If you are looking for support from the Mindee team
130
+
131
+ [![Bad OCR test detection image asking the developer if they need help](https://github.com/mindee/doctr/raw/main/docs/images/doctr-need-help.png)](https://mindee.com/product/doctr)
132
+
133
+ ## Installation
134
+
135
+ ### Prerequisites
136
+
137
+ Python 3.9 (or higher) and [pip](https://pip.pypa.io/en/stable/) are required to install docTR.
138
+
139
+ Since we use [weasyprint](https://weasyprint.org/), you will need extra dependencies if you are not running Linux.
140
+
141
+ For MacOS users, you can install them as follows:
142
+
143
+ ```shell
144
+ brew install cairo pango gdk-pixbuf libffi
145
+ ```
146
+
147
+ For Windows users, those dependencies are included in GTK. You can find the latest installer over [here](https://github.com/tschoonj/GTK-for-Windows-Runtime-Environment-Installer/releases).
148
+
149
+ ### Latest release
150
+
151
+ You can then install the latest release of the package using [pypi](https://pypi.org/project/python-doctr/) as follows:
152
+
153
+ ```shell
154
+ pip install python-doctr
155
+ ```
156
+
157
+ > :warning: Please note that the basic installation is not standalone, as it does not provide a deep learning framework, which is required for the package to run.
158
+
159
+ We try to keep framework-specific dependencies to a minimum. You can install framework-specific builds as follows:
160
+
161
+ ```shell
162
+ # for TensorFlow
163
+ pip install "python-doctr[tf]"
164
+ # for PyTorch
165
+ pip install "python-doctr[torch]"
166
+ ```
167
+
168
+ For MacBooks with M1 chip, you will need some additional packages or specific versions:
169
+
170
+ - TensorFlow 2: [metal plugin](https://developer.apple.com/metal/tensorflow-plugin/)
171
+ - PyTorch: [version >= 1.12.0](https://pytorch.org/get-started/locally/#start-locally)
172
+
173
+ ### Developer mode
174
+
175
+ Alternatively, you can install it from source, which will require you to install [Git](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git).
176
+ First clone the project repository:
177
+
178
+ ```shell
179
+ git clone https://github.com/mindee/doctr.git
180
+ pip install -e doctr/.
181
+ ```
182
+
183
+ Again, if you prefer to avoid the risk of missing dependencies, you can install the TensorFlow or the PyTorch build:
184
+
185
+ ```shell
186
+ # for TensorFlow
187
+ pip install -e doctr/.[tf]
188
+ # for PyTorch
189
+ pip install -e doctr/.[torch]
190
+ ```
191
+
192
+ ## Models architectures
193
+
194
+ Credits where it's due: this repository is implementing, among others, architectures from published research papers.
195
+
196
+ ### Text Detection
197
+
198
+ - DBNet: [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf).
199
+ - LinkNet: [LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation](https://arxiv.org/pdf/1707.03718.pdf)
200
+ - FAST: [FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation](https://arxiv.org/pdf/2111.02394.pdf)
201
+
202
+ ### Text Recognition
203
+
204
+ - CRNN: [An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition](https://arxiv.org/pdf/1507.05717.pdf).
205
+ - SAR: [Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition](https://arxiv.org/pdf/1811.00751.pdf).
206
+ - MASTER: [MASTER: Multi-Aspect Non-local Network for Scene Text Recognition](https://arxiv.org/pdf/1910.02562.pdf).
207
+ - ViTSTR: [Vision Transformer for Fast and Efficient Scene Text Recognition](https://arxiv.org/pdf/2105.08582.pdf).
208
+ - PARSeq: [Scene Text Recognition with Permuted Autoregressive Sequence Models](https://arxiv.org/pdf/2207.06966).
209
+
210
+ ## More goodies
211
+
212
+ ### Documentation
213
+
214
+ The full package documentation is available [here](https://mindee.github.io/doctr/) for detailed specifications.
215
+
216
+ ### Demo app
217
+
218
+ A minimal demo app is provided for you to play with our end-to-end OCR models!
219
+
220
+ ![Demo app](https://github.com/mindee/doctr/raw/main/docs/images/demo_update.png)
221
+
222
+ #### Live demo
223
+
224
+ Courtesy of :hugs: [Hugging Face](https://huggingface.co/) :hugs:, docTR has now a fully deployed version available on [Spaces](https://huggingface.co/spaces)!
225
+ Check it out [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/mindee/doctr)
226
+
227
+ #### Running it locally
228
+
229
+ If you prefer to use it locally, there is an extra dependency ([Streamlit](https://streamlit.io/)) that is required.
230
+
231
+ ##### Tensorflow version
232
+
233
+ ```shell
234
+ pip install -r demo/tf-requirements.txt
235
+ ```
236
+
237
+ Then run your app in your default browser with:
238
+
239
+ ```shell
240
+ USE_TF=1 streamlit run demo/app.py
241
+ ```
242
+
243
+ ##### PyTorch version
244
+
245
+ ```shell
246
+ pip install -r demo/pt-requirements.txt
247
+ ```
248
+
249
+ Then run your app in your default browser with:
250
+
251
+ ```shell
252
+ USE_TORCH=1 streamlit run demo/app.py
253
+ ```
254
+
255
+ #### TensorFlow.js
256
+
257
+ Instead of having your demo actually running Python, you would prefer to run everything in your web browser?
258
+ Check out our [TensorFlow.js demo](https://github.com/mindee/doctr-tfjs-demo) to get started!
259
+
260
+ ![TFJS demo](https://github.com/mindee/doctr/raw/main/docs/images/demo_illustration_mini.png)
261
+
262
+ ### Docker container
263
+
264
+ [We offer Docker container support for easy testing and deployment](https://github.com/mindee/doctr/pkgs/container/doctr).
265
+
266
+ #### Using GPU with docTR Docker Images
267
+
268
+ The docTR Docker images are GPU-ready and based on CUDA `11.8`.
269
+ However, to use GPU support with these Docker images, please ensure that Docker is configured to use your GPU.
270
+
271
+ To verify and configure GPU support for Docker, please follow the instructions provided in the [NVIDIA Container Toolkit Installation Guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html).
272
+
273
+ Once Docker is configured to use GPUs, you can run docTR Docker containers with GPU support:
274
+
275
+ ```shell
276
+ docker run -it --gpus all ghcr.io/mindee/doctr:tf-py3.8.18-gpu-2023-09 bash
277
+ ```
278
+
279
+ #### Available Tags
280
+
281
+ The Docker images for docTR follow a specific tag nomenclature: `<framework>-py<python_version>-<system>-<doctr_version|YYYY-MM>`. Here's a breakdown of the tag structure:
282
+
283
+ - `<framework>`: `tf` (TensorFlow) or `torch` (PyTorch).
284
+ - `<python_version>`: `3.8.18`, `3.9.18`, or `3.10.13`.
285
+ - `<system>`: `cpu` or `gpu`
286
+ - `<doctr_version>`: a tag >= `v0.7.1`
287
+ - `<YYYY-MM>`: e.g. `2023-09`
288
+
289
+ Here are examples of different image tags:
290
+
291
+ | Tag | Description |
292
+ |----------------------------|---------------------------------------------------|
293
+ | `tf-py3.8.18-cpu-v0.7.1` | TensorFlow version `3.8.18` with docTR `v0.7.1`. |
294
+ | `torch-py3.9.18-gpu-2023-09`| PyTorch version `3.9.18` with GPU support and a monthly build from `2023-09`. |
295
+
296
+ #### Building Docker Images Locally
297
+
298
+ You can also build docTR Docker images locally on your computer.
299
+
300
+ ```shell
301
+ docker build -t doctr .
302
+ ```
303
+
304
+ You can specify custom Python versions and docTR versions using build arguments. For example, to build a docTR image with TensorFlow, Python version `3.9.10`, and docTR version `v0.7.0`, run the following command:
305
+
306
+ ```shell
307
+ docker build -t doctr --build-arg FRAMEWORK=tf --build-arg PYTHON_VERSION=3.9.10 --build-arg DOCTR_VERSION=v0.7.0 .
308
+ ```
309
+
310
+ ### Example script
311
+
312
+ An example script is provided for a simple documentation analysis of a PDF or image file:
313
+
314
+ ```shell
315
+ python scripts/analyze.py path/to/your/doc.pdf
316
+ ```
317
+
318
+ All script arguments can be checked using `python scripts/analyze.py --help`
319
+
320
+ ### Minimal API integration
321
+
322
+ Looking to integrate docTR into your API? Here is a template to get you started with a fully working API using the wonderful [FastAPI](https://github.com/tiangolo/fastapi) framework.
323
+
324
+ #### Deploy your API locally
325
+
326
+ Specific dependencies are required to run the API template, which you can install as follows:
327
+
328
+ ```shell
329
+ cd api/
330
+ pip install poetry
331
+ make lock
332
+ pip install -r requirements.txt
333
+ ```
334
+
335
+ You can now run your API locally:
336
+
337
+ ```shell
338
+ uvicorn --reload --workers 1 --host 0.0.0.0 --port=8002 --app-dir api/ app.main:app
339
+ ```
340
+
341
+ Alternatively, you can run the same server on a docker container if you prefer using:
342
+
343
+ ```shell
344
+ PORT=8002 docker-compose up -d --build
345
+ ```
346
+
347
+ #### What you have deployed
348
+
349
+ Your API should now be running locally on your port 8002. Access your automatically-built documentation at [http://localhost:8002/redoc](http://localhost:8002/redoc) and enjoy your three functional routes ("/detection", "/recognition", "/ocr", "/kie"). Here is an example with Python to send a request to the OCR route:
350
+
351
+ ```python
352
+ import requests
353
+ with open('/path/to/your/doc.jpg', 'rb') as f:
354
+ data = f.read()
355
+ response = requests.post("http://localhost:8002/ocr", files={'file': data}).json()
356
+ ```
357
+
358
+ ### Example notebooks
359
+
360
+ Looking for more illustrations of docTR features? You might want to check the [Jupyter notebooks](https://github.com/mindee/doctr/tree/main/notebooks) designed to give you a broader overview.
361
+
362
+ ## Citation
363
+
364
+ If you wish to cite this project, feel free to use this [BibTeX](http://www.bibtex.org/) reference:
365
+
366
+ ```bibtex
367
+ @misc{doctr2021,
368
+ title={docTR: Document Text Recognition},
369
+ author={Mindee},
370
+ year={2021},
371
+ publisher = {GitHub},
372
+ howpublished = {\url{https://github.com/mindee/doctr}}
373
+ }
374
+ ```
375
+
376
+ ## Contributing
377
+
378
+ If you scrolled down to this section, you most likely appreciate open source. Do you feel like extending the range of our supported characters? Or perhaps submitting a paper implementation? Or contributing in any other way?
379
+
380
+ You're in luck, we compiled a short guide (cf. [`CONTRIBUTING`](https://mindee.github.io/doctr/contributing/contributing.html)) for you to easily do so!
381
+
382
+ ## License
383
+
384
+ Distributed under the Apache 2.0 License. See [`LICENSE`](https://github.com/mindee/doctr?tab=Apache-2.0-1-ov-file#readme) for more information.