balaboom123 commited on
Commit
f937a4e
·
verified ·
1 Parent(s): 6611ec3

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +107 -0
README.md ADDED
@@ -0,0 +1,107 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ tags:
6
+ - scene-text-recognition
7
+ - ocr
8
+ - vision-transformer
9
+ - mae
10
+ - image-to-text
11
+ - pytorch
12
+ library_name: pytorch
13
+ ---
14
+
15
+ # STR-Lite
16
+
17
+ STR-Lite is an ultra-lightweight scene text recognition model that combines **Masked Autoencoder (MAE) pretraining** with an **autoregressive decoder** for text generation. With only **6M parameters**, it achieves competitive accuracy while remaining highly efficient for real-world deployment.
18
+
19
+ - **GitHub:** [balaboom123/STR-Lite](https://github.com/balaboom123/STR-Lite)
20
+ - **Author:** Kuanwei Chen
21
+ - **License:** MIT
22
+
23
+ ## Model Architecture
24
+
25
+ | Component | Details |
26
+ | --------- | ------- |
27
+ | Backbone | ViT-Tiny (embed=192, depth=12, heads=12) |
28
+ | Decoder | 1-layer autoregressive transformer (embed=192, heads=12) |
29
+ | Input size | 32 × 128 (H × W) |
30
+ | Patch size | 4 × 8 |
31
+ | Parameters | ~6M |
32
+ | Precision | bfloat16 |
33
+
34
+ ## Training
35
+
36
+ **Stage 1 — MAE Pretraining**
37
+ - Dataset: U14M-Unlabeled
38
+ - Epochs: 40
39
+
40
+ **Stage 2 — Fine-tuning**
41
+ - Dataset: U14M-L-Filtered
42
+ - Epochs: 20, Batch: 256, LR: 1e-3, Weight decay: 0.01
43
+
44
+ ## Checkpoints
45
+
46
+ | Model | Description | Epochs | Acc | Download |
47
+ | ----- | ----------- | :----: | :-: | :------: |
48
+ | MAE ViT-Tiny | Pretrained encoder only | 40 | — | [pretrain/checkpoint-last.pth](https://huggingface.co/balaboom123/STRLite/resolve/main/pretrain/checkpoint-last.pth) |
49
+ | STRLite | Full fine-tuned model | 20 | 93.82% | [finetune/checkpoint-best.pth](https://huggingface.co/balaboom123/STRLite/resolve/main/finetune/checkpoint-best.pth) |
50
+
51
+ ## Results
52
+
53
+ **Common STR Benchmarks**
54
+
55
+ | Subset | w/ pretrain | w/o pretrain |
56
+ | ------ | :---------: | :----------: |
57
+ | CUTE80 | 95.83 | 94.79 |
58
+ | IC13 | 96.85 | 96.50 |
59
+ | IC15 | 86.80 | 86.25 |
60
+ | IIIT5k | 96.97 | 96.47 |
61
+ | SVT | 95.36 | 94.90 |
62
+ | SVTP | 92.40 | 89.77 |
63
+ | **Weighted avg.** | **93.82** | **93.12** |
64
+
65
+ **U14M Benchmarks**
66
+
67
+ | Subset | w/ pretrain | w/o pretrain |
68
+ | --------------- | :---------: | :----------: |
69
+ | artistic | 67.78 | 62.11 |
70
+ | contextless | 78.95 | 77.43 |
71
+ | curve | 82.19 | 78.97 |
72
+ | general | 81.07 | 79.96 |
73
+ | multi oriented | 82.91 | 78.57 |
74
+ | multi words | 76.72 | 74.31 |
75
+ | salient | 78.17 | 75.33 |
76
+ | **Weighted avg.** | **81.03** | **79.88** |
77
+
78
+ ## Usage
79
+
80
+ **Download and evaluate:**
81
+
82
+ ```bash
83
+ git clone https://github.com/balaboom123/STR-Lite
84
+ cd STR-Lite
85
+
86
+ # Download checkpoint
87
+ from huggingface_hub import hf_hub_download
88
+ path = hf_hub_download("balaboom123/STRLite", "finetune/checkpoint-best.pth")
89
+
90
+ # Evaluate
91
+ python eval.py \
92
+ resume=$path \
93
+ test_data_path='[/path/to/lmdb_test]'
94
+ ```
95
+
96
+ **Fine-tune from MAE pretrained weights:**
97
+
98
+ ```bash
99
+ path = hf_hub_download("balaboom123/STRLite", "pretrain/checkpoint-last.pth")
100
+
101
+ python main_finetune.py \
102
+ train_data_path='[/path/to/lmdb_train]' \
103
+ val_data_path='[/path/to/lmdb_val]' \
104
+ pretrained_mae=$path
105
+ ```
106
+
107
+ See the [GitHub repo](https://github.com/balaboom123/STR-Lite) for full installation and dataset preparation instructions.