indicnodeai commited on
Commit
191d8c6
·
0 Parent(s):

Duplicate from indicnodeai/dbnetpp_repvit_ch

Browse files
Files changed (3) hide show
  1. .gitattributes +35 -0
  2. README.md +132 -0
  3. dbnetpp_repvit_ch.pth +3 -0
.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,132 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - zh
5
+ - en
6
+ tags:
7
+ - text-detection
8
+ - ocr
9
+ - dbnet
10
+ - repvit
11
+ - pytorch
12
+ datasets:
13
+ - chinese-text-detection
14
+ pipeline_tag: image-segmentation
15
+ ---
16
+
17
+ # DBNet++ RepViT (Chinese)
18
+
19
+ Lightweight text detection model combining DBNet++ with RepViT backbone, optimized for efficient inference. Pretrained on **Chinese text detection datasets**.
20
+
21
+ ## Model Details
22
+
23
+ | Component | Configuration |
24
+ |-----------|--------------|
25
+ | Architecture | DBNet++ (Differentiable Binarization) |
26
+ | Backbone | RepViT (lightweight ViT-inspired CNN) |
27
+ | Neck | RSEFPN (in: [48, 96, 192, 384], out: 96) |
28
+ | Head | DBNetPPHead (inner: 24, k: 50) |
29
+ | Parameters | ~3M |
30
+ | Input Size | 640x640 (flexible) |
31
+
32
+ ## Training Data
33
+
34
+ This model was converted from [OpenOCR](https://github.com/Topdu/OpenOCR) pretrained weights, trained on **Chinese text detection datasets**.
35
+
36
+ **Recommended datasets for fine-tuning:**
37
+ - MSRA-TD500 (Chinese + English)
38
+ - ICDAR2017 RCTW (Chinese)
39
+ - CTW1500
40
+
41
+ **Note:** For English-only text detection, fine-tuning on English datasets (ICDAR2015, Total-Text) is recommended.
42
+
43
+ ## Usage
44
+
45
+ ### With Hugging Face
46
+
47
+ ```python
48
+ from huggingface_hub import hf_hub_download
49
+ import torch
50
+
51
+ # Download model
52
+ model_path = hf_hub_download(
53
+ repo_id="thisisiron/dbnetpp_repvit_ch",
54
+ filename="dbnetpp_repvit_ch.pth"
55
+ )
56
+
57
+ # Load weights
58
+ state_dict = torch.load(model_path, map_location="cpu")
59
+ ```
60
+
61
+ ### With OCR-Factory
62
+
63
+ ```python
64
+ import torch
65
+ from ocrfactory.models.detect import DBNetPP
66
+
67
+ # Build model
68
+ model = DBNetPP(
69
+ backbone={"name": "RepViT"},
70
+ neck={
71
+ "name": "RSEFPN",
72
+ "in_channels": [48, 96, 192, 384],
73
+ "out_channels": 96,
74
+ "shortcut": True
75
+ },
76
+ head={
77
+ "name": "DBNetPPHead",
78
+ "in_channels": 96,
79
+ "inner_channels": 24,
80
+ "k": 50,
81
+ "use_asf": False
82
+ }
83
+ )
84
+
85
+ # Load weights
86
+ state_dict = torch.load("dbnetpp_repvit_ch.pth", map_location="cpu")
87
+ model.load_state_dict(state_dict, strict=True)
88
+ model.eval()
89
+
90
+ # Inference
91
+ x = torch.randn(1, 3, 640, 640)
92
+ with torch.no_grad():
93
+ output = model(x)
94
+ shrink_map = output["shrink_map"] # (1, 1, 640, 640)
95
+ ```
96
+
97
+ ### Training Config (YAML)
98
+
99
+ ```yaml
100
+ architecture:
101
+ backbone:
102
+ name: RepViT
103
+ neck:
104
+ name: RSEFPN
105
+ in_channels: [48, 96, 192, 384]
106
+ out_channels: 96
107
+ shortcut: true
108
+ head:
109
+ name: DBNetPPHead
110
+ in_channels: 96
111
+ inner_channels: 24
112
+ k: 50
113
+ use_asf: false
114
+ ```
115
+
116
+ ## Performance
117
+
118
+ | Dataset | Precision | Recall | H-mean |
119
+ |---------|-----------|--------|--------|
120
+ | MSRA-TD500 | - | - | - |
121
+
122
+ *Performance metrics will be updated after benchmarking.*
123
+
124
+ ## References
125
+
126
+ - **OpenOCR**: https://github.com/Topdu/OpenOCR
127
+ - **RepViT**: https://github.com/THU-MIG/RepViT
128
+ - **DBNet++**: [Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion](https://arxiv.org/abs/2202.10304)
129
+
130
+ ## License
131
+
132
+ Apache 2.0
dbnetpp_repvit_ch.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:abb34802356cc705bb22fe25c369071b3436de45f93c78adeedb9171fd998a01
3
+ size 12728527