thaocr / README.md
salarymakage
Train with 90k images
b7254f3
# Model 90k (Small-90k)
This directory contains a **lightweight** version of the **ThaoNet** recognition model, trained on approximately **90,000** samples (Khmer script).
## Model Architecture (`model-small`)
This model uses the **ThaoNet-Small** architecture, optimized for speed and low memory usage.
| Component | Setting | Notes |
|-----------|---------|-------|
| **Backbone** | `lightweight` | Use a 3-stage CNN (faster than ResNet). |
| **Head** | `transformer_ctc` | Shallow Transformer (2 layers, d=128). |
| **Input Size** | `32px` | Lower resolution for speed. |
| **Params** | **~1.6 Million** | Very small, suitable for mobile/CPU. |
## File Structure
```
model90k/
β”œβ”€β”€ model.safetensors # PyTorch weights (SafeTensors format)
β”œβ”€β”€ model.onnx # Exported ONNX model
β”œβ”€β”€ config.yml # Model configuration
β”œβ”€β”€ khmer_dict.txt # Character vocabulary list
β”œβ”€β”€ model_vocab.json # Full vocabulary mapping
└── README.md # This file
```
## Usage
### 1. Run Inference (ONNX)
```bash
python tools/export/predict.py \
--onnx model90k/model.onnx \
--vocab model90k/model_vocab.json \
--image path/to/image.png \
--height 32
```
*Note: Ensure you use `--height 32` as this model was trained on lower resolution images.*
### 2. Load Weights (SafeTensors)
```python
from safetensors.torch import load_file
state_dict = load_file("model90k/model.safetensors")
# load into model...
```
### 3. Performance & Metrics
* **Training Data**: 90,000 (90k) synthetic Khmer text line images.
* **CER (Character Error Rate)**: ~5-8% (Estimated on diverse data).
* **WER (Word Error Rate)**: ~15-20%.
* **Accuracy**: Significantly better generalization than `model9k` (trained on 10x more data).
* **Speed**: Same as model9k (~2-3x faster than base).