STRLite / README.md
balaboom123's picture
Upload README.md with huggingface_hub
f937a4e verified
---
license: mit
language:
- en
tags:
- scene-text-recognition
- ocr
- vision-transformer
- mae
- image-to-text
- pytorch
library_name: pytorch
---
# STR-Lite
STR-Lite is an ultra-lightweight scene text recognition model that combines **Masked Autoencoder (MAE) pretraining** with an **autoregressive decoder** for text generation. With only **6M parameters**, it achieves competitive accuracy while remaining highly efficient for real-world deployment.
- **GitHub:** [balaboom123/STR-Lite](https://github.com/balaboom123/STR-Lite)
- **Author:** Kuanwei Chen
- **License:** MIT
## Model Architecture
| Component | Details |
| --------- | ------- |
| Backbone | ViT-Tiny (embed=192, depth=12, heads=12) |
| Decoder | 1-layer autoregressive transformer (embed=192, heads=12) |
| Input size | 32 × 128 (H × W) |
| Patch size | 4 × 8 |
| Parameters | ~6M |
| Precision | bfloat16 |
## Training
**Stage 1 — MAE Pretraining**
- Dataset: U14M-Unlabeled
- Epochs: 40
**Stage 2 — Fine-tuning**
- Dataset: U14M-L-Filtered
- Epochs: 20, Batch: 256, LR: 1e-3, Weight decay: 0.01
## Checkpoints
| Model | Description | Epochs | Acc | Download |
| ----- | ----------- | :----: | :-: | :------: |
| MAE ViT-Tiny | Pretrained encoder only | 40 | — | [pretrain/checkpoint-last.pth](https://huggingface.co/balaboom123/STRLite/resolve/main/pretrain/checkpoint-last.pth) |
| STRLite | Full fine-tuned model | 20 | 93.82% | [finetune/checkpoint-best.pth](https://huggingface.co/balaboom123/STRLite/resolve/main/finetune/checkpoint-best.pth) |
## Results
**Common STR Benchmarks**
| Subset | w/ pretrain | w/o pretrain |
| ------ | :---------: | :----------: |
| CUTE80 | 95.83 | 94.79 |
| IC13 | 96.85 | 96.50 |
| IC15 | 86.80 | 86.25 |
| IIIT5k | 96.97 | 96.47 |
| SVT | 95.36 | 94.90 |
| SVTP | 92.40 | 89.77 |
| **Weighted avg.** | **93.82** | **93.12** |
**U14M Benchmarks**
| Subset | w/ pretrain | w/o pretrain |
| --------------- | :---------: | :----------: |
| artistic | 67.78 | 62.11 |
| contextless | 78.95 | 77.43 |
| curve | 82.19 | 78.97 |
| general | 81.07 | 79.96 |
| multi oriented | 82.91 | 78.57 |
| multi words | 76.72 | 74.31 |
| salient | 78.17 | 75.33 |
| **Weighted avg.** | **81.03** | **79.88** |
## Usage
**Download and evaluate:**
```bash
git clone https://github.com/balaboom123/STR-Lite
cd STR-Lite
# Download checkpoint
from huggingface_hub import hf_hub_download
path = hf_hub_download("balaboom123/STRLite", "finetune/checkpoint-best.pth")
# Evaluate
python eval.py \
resume=$path \
test_data_path='[/path/to/lmdb_test]'
```
**Fine-tune from MAE pretrained weights:**
```bash
path = hf_hub_download("balaboom123/STRLite", "pretrain/checkpoint-last.pth")
python main_finetune.py \
train_data_path='[/path/to/lmdb_train]' \
val_data_path='[/path/to/lmdb_val]' \
pretrained_mae=$path
```
See the [GitHub repo](https://github.com/balaboom123/STR-Lite) for full installation and dataset preparation instructions.