Thareah
/

thaocr

Model card Files Files and versions

thaocr / README.md

salarymakage

Train with 90k images

b7254f3 about 1 month ago

|

history blame contribute delete

1.85 kB

	# Model 90k (Small-90k)

	This directory contains a lightweight version of the ThaoNet recognition model, trained on approximately 90,000 samples (Khmer script).

	## Model Architecture (`model-small`)

	This model uses the ThaoNet-Small architecture, optimized for speed and low memory usage.

	\| Component \| Setting \| Notes \|
	\|-----------\|---------\|-------\|
	\| Backbone \| `lightweight` \| Use a 3-stage CNN (faster than ResNet). \|
	\| Head \| `transformer_ctc` \| Shallow Transformer (2 layers, d=128). \|
	\| Input Size \| `32px` \| Lower resolution for speed. \|
	\| Params \| ~1.6 Million \| Very small, suitable for mobile/CPU. \|

	## File Structure
	```
	model90k/
	├── model.safetensors # PyTorch weights (SafeTensors format)
	├── model.onnx # Exported ONNX model
	├── config.yml # Model configuration
	├── khmer_dict.txt # Character vocabulary list
	├── model_vocab.json # Full vocabulary mapping
	└── README.md # This file
	```

	## Usage

	### 1. Run Inference (ONNX)

	```bash
	python tools/export/predict.py \
	--onnx model90k/model.onnx \
	--vocab model90k/model_vocab.json \
	--image path/to/image.png \
	--height 32
	```
	Note: Ensure you use `--height 32` as this model was trained on lower resolution images.

	### 2. Load Weights (SafeTensors)

	```python
	from safetensors.torch import load_file
	state_dict = load_file("model90k/model.safetensors")
	# load into model...
	```

	### 3. Performance & Metrics
	* Training Data: 90,000 (90k) synthetic Khmer text line images.
	* CER (Character Error Rate): ~5-8% (Estimated on diverse data).
	* WER (Word Error Rate): ~15-20%.
	* Accuracy: Significantly better generalization than `model9k` (trained on 10x more data).
	* Speed: Same as model9k (~2-3x faster than base).