| # Model 90k (Small-90k) |
|
|
| This directory contains a **lightweight** version of the **ThaoNet** recognition model, trained on approximately **90,000** samples (Khmer script). |
|
|
| ## Model Architecture (`model-small`) |
|
|
| This model uses the **ThaoNet-Small** architecture, optimized for speed and low memory usage. |
|
|
| | Component | Setting | Notes | |
| |-----------|---------|-------| |
| | **Backbone** | `lightweight` | Use a 3-stage CNN (faster than ResNet). | |
| | **Head** | `transformer_ctc` | Shallow Transformer (2 layers, d=128). | |
| | **Input Size** | `32px` | Lower resolution for speed. | |
| | **Params** | **~1.6 Million** | Very small, suitable for mobile/CPU. | |
|
|
| ## File Structure |
| ``` |
| model90k/ |
| βββ model.safetensors # PyTorch weights (SafeTensors format) |
| βββ model.onnx # Exported ONNX model |
| βββ config.yml # Model configuration |
| βββ khmer_dict.txt # Character vocabulary list |
| βββ model_vocab.json # Full vocabulary mapping |
| βββ README.md # This file |
| ``` |
|
|
| ## Usage |
|
|
| ### 1. Run Inference (ONNX) |
|
|
| ```bash |
| python tools/export/predict.py \ |
| --onnx model90k/model.onnx \ |
| --vocab model90k/model_vocab.json \ |
| --image path/to/image.png \ |
| --height 32 |
| ``` |
| *Note: Ensure you use `--height 32` as this model was trained on lower resolution images.* |
|
|
| ### 2. Load Weights (SafeTensors) |
|
|
| ```python |
| from safetensors.torch import load_file |
| state_dict = load_file("model90k/model.safetensors") |
| # load into model... |
| ``` |
|
|
| ### 3. Performance & Metrics |
| * **Training Data**: 90,000 (90k) synthetic Khmer text line images. |
| * **CER (Character Error Rate)**: ~5-8% (Estimated on diverse data). |
| * **WER (Word Error Rate)**: ~15-20%. |
| * **Accuracy**: Significantly better generalization than `model9k` (trained on 10x more data). |
| * **Speed**: Same as model9k (~2-3x faster than base). |
|
|