thisisiron
/

dbnetpp_repvit_ch

@@ -1,69 +1,132 @@
----
-license: apache-2.0
-tags:
-  - ocr
-  - text-detection
-  - dbnet
-  - pytorch
-library_name: ocrfactory
-pipeline_tag: object-detection
----
-# DBNet++ with RepViT Backbone
-A lightweight text detection model combining DBNet++ with RepViT backbone, optimized for efficient inference.
-## Model Description
-- **Architecture**: DBNet++ (Differentiable Binarization)
-- **Backbone**: RepViT (lightweight ViT-inspired CNN)
-- **Neck**: RSEFPN (Residual Squeeze-and-Excitation FPN)
-- **Head**: DBNetPPHead
-## Model Details
-| Component | Configuration |
-|-----------|--------------|
-| Backbone | RepViT |
-| Neck | RSEFPN (in: [48, 96, 192, 384], out: 96) |
-| Head | DBNetPPHead (inner: 24, k: 50) |
-| Parameters | ~3M |
-| Input Size | 640x640 (flexible) |
-## Usage
-```python
-import torch
-from ocrfactory.models.detect import DBNetPP
-# Build model
-model = DBNetPP(
-    backbone={"name": "RepViT"},
-    neck={"name": "RSEFPN", "in_channels": [48, 96, 192, 384], "out_channels": 96, "shortcut": True},
-    head={"name": "DBNetPPHead", "in_channels": 96, "inner_channels": 24, "k": 50, "use_asf": False}
-)
-# Load weights
-state_dict = torch.load("dbnetpp_repvit.pth", map_location="cpu")
-model.load_state_dict(state_dict, strict=True)
-model.eval()
-# Inference
-x = torch.randn(1, 3, 640, 640)
-with torch.no_grad():
-    output = model(x)
-    shrink_map = output["shrink_map"]  # (1, 1, 640, 640)
-```
-## Training
-This model was converted from [OpenOCR](https://github.com/Topdu/OpenOCR) pretrained weights trained on Chinese text detection datasets.
-## Original Source
-- OpenOCR: https://github.com/Topdu/OpenOCR
-- RepViT: https://github.com/THU-MIG/RepViT
-## License
-Apache 2.0

+---
+license: apache-2.0
+language:
+- zh
+- en
+tags:
+- text-detection
+- ocr
+- dbnet
+- repvit
+- pytorch
+datasets:
+- chinese-text-detection
+pipeline_tag: image-segmentation
+---
+# DBNet++ RepViT (Chinese)
+Lightweight text detection model combining DBNet++ with RepViT backbone, optimized for efficient inference. Pretrained on **Chinese text detection datasets**.
+## Model Details
+| Component | Configuration |
+|-----------|--------------|
+| Architecture | DBNet++ (Differentiable Binarization) |
+| Backbone | RepViT (lightweight ViT-inspired CNN) |
+| Neck | RSEFPN (in: [48, 96, 192, 384], out: 96) |
+| Head | DBNetPPHead (inner: 24, k: 50) |
+| Parameters | ~3M |
+| Input Size | 640x640 (flexible) |
+## Training Data
+This model was converted from [OpenOCR](https://github.com/Topdu/OpenOCR) pretrained weights, trained on **Chinese text detection datasets**.
+**Recommended datasets for fine-tuning:**
+- MSRA-TD500 (Chinese + English)
+- ICDAR2017 RCTW (Chinese)
+- CTW1500
+**Note:** For English-only text detection, fine-tuning on English datasets (ICDAR2015, Total-Text) is recommended.
+## Usage
+### With Hugging Face
+```python
+from huggingface_hub import hf_hub_download
+import torch
+# Download model
+model_path = hf_hub_download(
+    repo_id="thisisiron/dbnetpp_repvit_ch",
+    filename="dbnetpp_repvit_ch.pth"
+)
+# Load weights
+state_dict = torch.load(model_path, map_location="cpu")
+```
+### With OCR-Factory
+```python
+import torch
+from ocrfactory.models.detect import DBNetPP
+# Build model
+model = DBNetPP(
+    backbone={"name": "RepViT"},
+    neck={
+        "name": "RSEFPN",
+        "in_channels": [48, 96, 192, 384],
+        "out_channels": 96,
+        "shortcut": True
+    },
+    head={
+        "name": "DBNetPPHead",
+        "in_channels": 96,
+        "inner_channels": 24,
+        "k": 50,
+        "use_asf": False
+    }
+)
+# Load weights
+state_dict = torch.load("dbnetpp_repvit_ch.pth", map_location="cpu")
+model.load_state_dict(state_dict, strict=True)
+model.eval()
+# Inference
+x = torch.randn(1, 3, 640, 640)
+with torch.no_grad():
+    output = model(x)
+    shrink_map = output["shrink_map"]  # (1, 1, 640, 640)
+```
+### Training Config (YAML)
+```yaml
+architecture:
+  backbone:
+    name: RepViT
+  neck:
+    name: RSEFPN
+    in_channels: [48, 96, 192, 384]
+    out_channels: 96
+    shortcut: true
+  head:
+    name: DBNetPPHead
+    in_channels: 96
+    inner_channels: 24
+    k: 50
+    use_asf: false
+```
+## Performance
+| Dataset | Precision | Recall | H-mean |
+|---------|-----------|--------|--------|
+| MSRA-TD500 | - | - | - |
+*Performance metrics will be updated after benchmarking.*
+## References
+- **OpenOCR**: https://github.com/Topdu/OpenOCR
+- **RepViT**: https://github.com/THU-MIG/RepViT
+- **DBNet++**: [Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion](https://arxiv.org/abs/2202.10304)
+## License
+Apache 2.0