thisisiron commited on
Commit
ac79670
·
verified ·
1 Parent(s): 8dc8f4a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +132 -69
README.md CHANGED
@@ -1,69 +1,132 @@
1
- ---
2
- license: apache-2.0
3
- tags:
4
- - ocr
5
- - text-detection
6
- - dbnet
7
- - pytorch
8
- library_name: ocrfactory
9
- pipeline_tag: object-detection
10
- ---
11
-
12
- # DBNet++ with RepViT Backbone
13
-
14
- A lightweight text detection model combining DBNet++ with RepViT backbone, optimized for efficient inference.
15
-
16
- ## Model Description
17
-
18
- - **Architecture**: DBNet++ (Differentiable Binarization)
19
- - **Backbone**: RepViT (lightweight ViT-inspired CNN)
20
- - **Neck**: RSEFPN (Residual Squeeze-and-Excitation FPN)
21
- - **Head**: DBNetPPHead
22
-
23
- ## Model Details
24
-
25
- | Component | Configuration |
26
- |-----------|--------------|
27
- | Backbone | RepViT |
28
- | Neck | RSEFPN (in: [48, 96, 192, 384], out: 96) |
29
- | Head | DBNetPPHead (inner: 24, k: 50) |
30
- | Parameters | ~3M |
31
- | Input Size | 640x640 (flexible) |
32
-
33
- ## Usage
34
-
35
- ```python
36
- import torch
37
- from ocrfactory.models.detect import DBNetPP
38
-
39
- # Build model
40
- model = DBNetPP(
41
- backbone={"name": "RepViT"},
42
- neck={"name": "RSEFPN", "in_channels": [48, 96, 192, 384], "out_channels": 96, "shortcut": True},
43
- head={"name": "DBNetPPHead", "in_channels": 96, "inner_channels": 24, "k": 50, "use_asf": False}
44
- )
45
-
46
- # Load weights
47
- state_dict = torch.load("dbnetpp_repvit.pth", map_location="cpu")
48
- model.load_state_dict(state_dict, strict=True)
49
- model.eval()
50
-
51
- # Inference
52
- x = torch.randn(1, 3, 640, 640)
53
- with torch.no_grad():
54
- output = model(x)
55
- shrink_map = output["shrink_map"] # (1, 1, 640, 640)
56
- ```
57
-
58
- ## Training
59
-
60
- This model was converted from [OpenOCR](https://github.com/Topdu/OpenOCR) pretrained weights trained on Chinese text detection datasets.
61
-
62
- ## Original Source
63
-
64
- - OpenOCR: https://github.com/Topdu/OpenOCR
65
- - RepViT: https://github.com/THU-MIG/RepViT
66
-
67
- ## License
68
-
69
- Apache 2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - zh
5
+ - en
6
+ tags:
7
+ - text-detection
8
+ - ocr
9
+ - dbnet
10
+ - repvit
11
+ - pytorch
12
+ datasets:
13
+ - chinese-text-detection
14
+ pipeline_tag: image-segmentation
15
+ ---
16
+
17
+ # DBNet++ RepViT (Chinese)
18
+
19
+ Lightweight text detection model combining DBNet++ with RepViT backbone, optimized for efficient inference. Pretrained on **Chinese text detection datasets**.
20
+
21
+ ## Model Details
22
+
23
+ | Component | Configuration |
24
+ |-----------|--------------|
25
+ | Architecture | DBNet++ (Differentiable Binarization) |
26
+ | Backbone | RepViT (lightweight ViT-inspired CNN) |
27
+ | Neck | RSEFPN (in: [48, 96, 192, 384], out: 96) |
28
+ | Head | DBNetPPHead (inner: 24, k: 50) |
29
+ | Parameters | ~3M |
30
+ | Input Size | 640x640 (flexible) |
31
+
32
+ ## Training Data
33
+
34
+ This model was converted from [OpenOCR](https://github.com/Topdu/OpenOCR) pretrained weights, trained on **Chinese text detection datasets**.
35
+
36
+ **Recommended datasets for fine-tuning:**
37
+ - MSRA-TD500 (Chinese + English)
38
+ - ICDAR2017 RCTW (Chinese)
39
+ - CTW1500
40
+
41
+ **Note:** For English-only text detection, fine-tuning on English datasets (ICDAR2015, Total-Text) is recommended.
42
+
43
+ ## Usage
44
+
45
+ ### With Hugging Face
46
+
47
+ ```python
48
+ from huggingface_hub import hf_hub_download
49
+ import torch
50
+
51
+ # Download model
52
+ model_path = hf_hub_download(
53
+ repo_id="thisisiron/dbnetpp_repvit_ch",
54
+ filename="dbnetpp_repvit_ch.pth"
55
+ )
56
+
57
+ # Load weights
58
+ state_dict = torch.load(model_path, map_location="cpu")
59
+ ```
60
+
61
+ ### With OCR-Factory
62
+
63
+ ```python
64
+ import torch
65
+ from ocrfactory.models.detect import DBNetPP
66
+
67
+ # Build model
68
+ model = DBNetPP(
69
+ backbone={"name": "RepViT"},
70
+ neck={
71
+ "name": "RSEFPN",
72
+ "in_channels": [48, 96, 192, 384],
73
+ "out_channels": 96,
74
+ "shortcut": True
75
+ },
76
+ head={
77
+ "name": "DBNetPPHead",
78
+ "in_channels": 96,
79
+ "inner_channels": 24,
80
+ "k": 50,
81
+ "use_asf": False
82
+ }
83
+ )
84
+
85
+ # Load weights
86
+ state_dict = torch.load("dbnetpp_repvit_ch.pth", map_location="cpu")
87
+ model.load_state_dict(state_dict, strict=True)
88
+ model.eval()
89
+
90
+ # Inference
91
+ x = torch.randn(1, 3, 640, 640)
92
+ with torch.no_grad():
93
+ output = model(x)
94
+ shrink_map = output["shrink_map"] # (1, 1, 640, 640)
95
+ ```
96
+
97
+ ### Training Config (YAML)
98
+
99
+ ```yaml
100
+ architecture:
101
+ backbone:
102
+ name: RepViT
103
+ neck:
104
+ name: RSEFPN
105
+ in_channels: [48, 96, 192, 384]
106
+ out_channels: 96
107
+ shortcut: true
108
+ head:
109
+ name: DBNetPPHead
110
+ in_channels: 96
111
+ inner_channels: 24
112
+ k: 50
113
+ use_asf: false
114
+ ```
115
+
116
+ ## Performance
117
+
118
+ | Dataset | Precision | Recall | H-mean |
119
+ |---------|-----------|--------|--------|
120
+ | MSRA-TD500 | - | - | - |
121
+
122
+ *Performance metrics will be updated after benchmarking.*
123
+
124
+ ## References
125
+
126
+ - **OpenOCR**: https://github.com/Topdu/OpenOCR
127
+ - **RepViT**: https://github.com/THU-MIG/RepViT
128
+ - **DBNet++**: [Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion](https://arxiv.org/abs/2202.10304)
129
+
130
+ ## License
131
+
132
+ Apache 2.0