thisisiron commited on
Commit
8dc8f4a
·
1 Parent(s): db7dbe2

Add DBNet++ RepViT pretrained weights

Browse files
Files changed (2) hide show
  1. README.md +66 -0
  2. dbnetpp_repvit.pth +3 -0
README.md CHANGED
@@ -1,3 +1,69 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ tags:
4
+ - ocr
5
+ - text-detection
6
+ - dbnet
7
+ - pytorch
8
+ library_name: ocrfactory
9
+ pipeline_tag: object-detection
10
  ---
11
+
12
+ # DBNet++ with RepViT Backbone
13
+
14
+ A lightweight text detection model combining DBNet++ with RepViT backbone, optimized for efficient inference.
15
+
16
+ ## Model Description
17
+
18
+ - **Architecture**: DBNet++ (Differentiable Binarization)
19
+ - **Backbone**: RepViT (lightweight ViT-inspired CNN)
20
+ - **Neck**: RSEFPN (Residual Squeeze-and-Excitation FPN)
21
+ - **Head**: DBNetPPHead
22
+
23
+ ## Model Details
24
+
25
+ | Component | Configuration |
26
+ |-----------|--------------|
27
+ | Backbone | RepViT |
28
+ | Neck | RSEFPN (in: [48, 96, 192, 384], out: 96) |
29
+ | Head | DBNetPPHead (inner: 24, k: 50) |
30
+ | Parameters | ~3M |
31
+ | Input Size | 640x640 (flexible) |
32
+
33
+ ## Usage
34
+
35
+ ```python
36
+ import torch
37
+ from ocrfactory.models.detect import DBNetPP
38
+
39
+ # Build model
40
+ model = DBNetPP(
41
+ backbone={"name": "RepViT"},
42
+ neck={"name": "RSEFPN", "in_channels": [48, 96, 192, 384], "out_channels": 96, "shortcut": True},
43
+ head={"name": "DBNetPPHead", "in_channels": 96, "inner_channels": 24, "k": 50, "use_asf": False}
44
+ )
45
+
46
+ # Load weights
47
+ state_dict = torch.load("dbnetpp_repvit.pth", map_location="cpu")
48
+ model.load_state_dict(state_dict, strict=True)
49
+ model.eval()
50
+
51
+ # Inference
52
+ x = torch.randn(1, 3, 640, 640)
53
+ with torch.no_grad():
54
+ output = model(x)
55
+ shrink_map = output["shrink_map"] # (1, 1, 640, 640)
56
+ ```
57
+
58
+ ## Training
59
+
60
+ This model was converted from [OpenOCR](https://github.com/Topdu/OpenOCR) pretrained weights trained on Chinese text detection datasets.
61
+
62
+ ## Original Source
63
+
64
+ - OpenOCR: https://github.com/Topdu/OpenOCR
65
+ - RepViT: https://github.com/THU-MIG/RepViT
66
+
67
+ ## License
68
+
69
+ Apache 2.0
dbnetpp_repvit.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:abb34802356cc705bb22fe25c369071b3436de45f93c78adeedb9171fd998a01
3
+ size 12728527