thisisiron
commited on
Commit
·
8dc8f4a
1
Parent(s):
db7dbe2
Add DBNet++ RepViT pretrained weights
Browse files- README.md +66 -0
- dbnetpp_repvit.pth +3 -0
README.md
CHANGED
|
@@ -1,3 +1,69 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
+
tags:
|
| 4 |
+
- ocr
|
| 5 |
+
- text-detection
|
| 6 |
+
- dbnet
|
| 7 |
+
- pytorch
|
| 8 |
+
library_name: ocrfactory
|
| 9 |
+
pipeline_tag: object-detection
|
| 10 |
---
|
| 11 |
+
|
| 12 |
+
# DBNet++ with RepViT Backbone
|
| 13 |
+
|
| 14 |
+
A lightweight text detection model combining DBNet++ with RepViT backbone, optimized for efficient inference.
|
| 15 |
+
|
| 16 |
+
## Model Description
|
| 17 |
+
|
| 18 |
+
- **Architecture**: DBNet++ (Differentiable Binarization)
|
| 19 |
+
- **Backbone**: RepViT (lightweight ViT-inspired CNN)
|
| 20 |
+
- **Neck**: RSEFPN (Residual Squeeze-and-Excitation FPN)
|
| 21 |
+
- **Head**: DBNetPPHead
|
| 22 |
+
|
| 23 |
+
## Model Details
|
| 24 |
+
|
| 25 |
+
| Component | Configuration |
|
| 26 |
+
|-----------|--------------|
|
| 27 |
+
| Backbone | RepViT |
|
| 28 |
+
| Neck | RSEFPN (in: [48, 96, 192, 384], out: 96) |
|
| 29 |
+
| Head | DBNetPPHead (inner: 24, k: 50) |
|
| 30 |
+
| Parameters | ~3M |
|
| 31 |
+
| Input Size | 640x640 (flexible) |
|
| 32 |
+
|
| 33 |
+
## Usage
|
| 34 |
+
|
| 35 |
+
```python
|
| 36 |
+
import torch
|
| 37 |
+
from ocrfactory.models.detect import DBNetPP
|
| 38 |
+
|
| 39 |
+
# Build model
|
| 40 |
+
model = DBNetPP(
|
| 41 |
+
backbone={"name": "RepViT"},
|
| 42 |
+
neck={"name": "RSEFPN", "in_channels": [48, 96, 192, 384], "out_channels": 96, "shortcut": True},
|
| 43 |
+
head={"name": "DBNetPPHead", "in_channels": 96, "inner_channels": 24, "k": 50, "use_asf": False}
|
| 44 |
+
)
|
| 45 |
+
|
| 46 |
+
# Load weights
|
| 47 |
+
state_dict = torch.load("dbnetpp_repvit.pth", map_location="cpu")
|
| 48 |
+
model.load_state_dict(state_dict, strict=True)
|
| 49 |
+
model.eval()
|
| 50 |
+
|
| 51 |
+
# Inference
|
| 52 |
+
x = torch.randn(1, 3, 640, 640)
|
| 53 |
+
with torch.no_grad():
|
| 54 |
+
output = model(x)
|
| 55 |
+
shrink_map = output["shrink_map"] # (1, 1, 640, 640)
|
| 56 |
+
```
|
| 57 |
+
|
| 58 |
+
## Training
|
| 59 |
+
|
| 60 |
+
This model was converted from [OpenOCR](https://github.com/Topdu/OpenOCR) pretrained weights trained on Chinese text detection datasets.
|
| 61 |
+
|
| 62 |
+
## Original Source
|
| 63 |
+
|
| 64 |
+
- OpenOCR: https://github.com/Topdu/OpenOCR
|
| 65 |
+
- RepViT: https://github.com/THU-MIG/RepViT
|
| 66 |
+
|
| 67 |
+
## License
|
| 68 |
+
|
| 69 |
+
Apache 2.0
|
dbnetpp_repvit.pth
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:abb34802356cc705bb22fe25c369071b3436de45f93c78adeedb9171fd998a01
|
| 3 |
+
size 12728527
|