nexaml commited on
Commit
eb304df
·
verified ·
1 Parent(s): 0d248d6

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +46 -0
README.md ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: image-to-text
3
+ tags:
4
+ - NPU
5
+ ---
6
+ # PaddleOCR v4 (PP-OCRv4) for Android
7
+
8
+ ## Quickstart
9
+
10
+ See [Documentation](https://docs.nexa.ai/nexa-sdk-android/quickstart)
11
+
12
+ ## Model Description
13
+ **PP-OCRv4** is the fourth-generation end-to-end optical character recognition system from the PaddlePaddle team.
14
+ It combines a lightweight **text detection → angle classification → text recognition** pipeline with improved training techniques and data augmentation, delivering higher accuracy and robustness while staying efficient for real-time use.
15
+
16
+ PP-OCRv4 supports multilingual OCR (Latin and non-Latin scripts), irregular layouts (rotated/curved text), and challenging inputs such as noisy or low-resolution images often found in mobile and document-scan scenarios.
17
+
18
+ ## Features
19
+ - **End-to-end OCR**: text detection, optional angle classification, and text recognition in one pipeline.
20
+ - **Multilingual support**: pretrained models for English, Chinese, and dozens of other languages; easy finetuning for domain text.
21
+ - **Robust in real-world conditions**: handles rotation, perspective distortion, blur, low light, and complex backgrounds.
22
+ - **Lightweight & fast**: practical for both mobile apps and large-scale server deployments.
23
+ - **Flexible I/O**: works with photos, scans, screenshots, receipts, invoices, ID cards, dashboards, and UI text.
24
+ - **Extensible**: swap components (detector/recognizer), add language packs, or finetune on domain datasets.
25
+
26
+ ## Use Cases
27
+ - Document digitization (invoices, receipts, forms, contracts)
28
+ - RPA and back-office automation (screen/OCR flows)
29
+ - Mobile scanning apps and camera-based translation/read-aloud
30
+ - Industrial and retail analytics (labels, price tags, shelf tags)
31
+ - Accessibility (screen-readers and read-aloud applications)
32
+
33
+ ## Inputs and Outputs
34
+ **Input**: Image (photo, scan, or screenshot).
35
+ **Output**: A list of detected text regions, each with:
36
+ - bounding box (rectangular or polygonal)
37
+ - recognized text string
38
+ - optional confidence score and orientation
39
+
40
+
41
+ ## License
42
+ - Licensed under [Apache-2.0](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.7/LICENSE)
43
+
44
+ ## References
45
+ - GitHub repo: [https://github.com/PaddlePaddle/PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)
46
+ - Model zoo & documentation: [Models list](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.7/doc/doc_en/models_list_en.md)