Chinese Handwriting Recognition โ HSK1 v3 (ResNet CNN)
A ResNet-style CNN trained on HWDB1.0 to recognise 178 Chinese characters + Unknown.
What's new in v3
| Feature |
v2 |
v3 |
| Architecture |
Plain CNN |
ResNet-style residual blocks |
| Loss |
Cross-entropy |
Cross-entropy + label smoothing (0.1) |
| Optimizer |
Adam |
AdamW (weight decay 1e-4) |
| Augmentation |
Rotation, shift, zoom |
+ Shear, stronger zoom |
| Temperature scaling |
Yes (buggy) |
Removed (not needed) |
| Config |
Scattered |
Central CFG dict |
Model details
| Item |
Value |
| Input |
40ร40 grayscale image |
| Classes |
179 (178 Chinese characters + Unknown) |
| Framework |
Keras / TensorFlow |
| Confidence threshold |
0.3 |
| OOD training data |
EMNIST Balanced (8% of training set) |
Quick start
import numpy as np, json
import tensorflow as tf
from tensorflow import keras
model = keras.models.load_model('chinese_hsk1_model_v3.keras')
label_map = json.load(open('label_map_v3.json', encoding='utf-8'))
cfg = json.load(open('config_v3.json'))
THRESHOLD = cfg['threshold']
def predict(img_gray):
x = img_gray.astype('float32') / 255.0
x = x.reshape(1, cfg['img_size'], cfg['img_size'], 1)
probs = model.predict(x)[0]
conf = probs.max()
idx = probs.argmax()
char = label_map[str(idx)]
if char == 'Unknown' or conf < THRESHOLD:
return 'Unknown', conf
return char, conf