geez-char-ocr / README.md
Yaredoffice's picture
Update README.md
58df1c9 verified
---
license: apache-2.0
datasets:
- Yaredoffice/geez-characters
language:
- am
- ti
pipeline_tag: image-classification
library_name: keras
tags:
- geez
- characters
- ocr
- geez ocr
- amharic
- tigrinya
- onnx
---
# Geez Character OCR (Geez-Net)
<!-- Provide a quick summary of what the model is/does. -->
This model is a high-performance Optical Character Recognition (OCR) system specifically designed for the **Geez script** (Amharic, Tigrinya). It utilizes a Convolutional Neural Network (CNN) architecture to classify individual handwritten Geez characters from images with high accuracy.
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
This model addresses the challenge of digital recognition for the Geez script by utilizing a deep CNN architecture. It is trained to accept a single character image and output one of 287 possible character classes. It has been optimized for web deployment using ONNX runtime.
- **Developed by:** Yared Kassa
- **Shared by:** Yared Kassa
- **Model type:** Convolutional Neural Network (CNN) for Image Classification
- **Language(s):** Amharic, Tigrinya (Geez Script)
- **License:** apache-2.0
- **Finetuned from model:** Trained from scratch
### Model Sources [optional]
<!-- Provide a basic links for the model. -->
- **Repository:** [Yaredoffice/geez-characters](https://huggingface.co/Yaredoffice/geez-characters)
## Uses
<!-- Address questions around how the model is intended to be used, including foreseeable users of the model and those affected by the model. -->
### Direct Use
<!-- This section is for model use without fine-tuning or plugging into a larger ecosystem/app. -->
The model is intended for direct use in digitizing handwritten Geez documents, educational language learning tools, and automated data entry systems. Users input a cropped image of a handwritten character, and the model returns the predicted character class and confidence score.
### Downstream Use [optional]
<!-- This section is for model use when fine-tuned for a task, or when plugged into a larger ecosystem/app. -->
N/A (This is a standalone classification model).
### Out-of-Scope Use
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
The model is **not** designed for:
- Full document OCR (it does not perform word segmentation or layout analysis).
- Recognition of non-Geez scripts (Latin, Arabic, etc.).
- Recognition of cursive or heavily stylized fonts not present in the training data.
## Bias, Risks, and Limitations
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
### Limitations
1. **Single Character Input:** The model requires pre-segmented single-character images. It cannot process whole words or sentences directly.
2. **Input Quality:** Performance may degrade on low-resolution or highly noisy images without pre-processing.
3. **Data Bias:** While trained on ~400k augmented images, the model may be biased towards the specific handwriting styles present in the original 13k source dataset.
### Recommendations
<!-- This section is meant to convey recommendations with respect to bias, risk, and technical limitations. -->
Users should implement a pre-processing pipeline to segment words into individual characters before feeding them into this model. Images should be normalized to 128x128 pixels and converted to grayscale.
## Evaluation
### Testing Data, Factors & Metrics
#### Metrics
- **Accuracy**: The primary metric used for evaluation.
#### Inference Performance
- **Single Image Inference**: 81% baseline accuracy.
- **Test-Time Augmentation (TTA)**:
- Configuration: 10 augmentations with majority voting.
- Result: Achieves approximately **90% classification accuracy**.
- Impact: Significantly reduces error rates caused by handwriting variability.
## How to Get Started with the Model
Use the code below to get started with the model.
```python
import onnxruntime as ort
import numpy as np
from PIL import Image
# 1. Load the ONNX model
session = ort.InferenceSession("cnn_output.onnx")
# 2. Preprocess input image
def preprocess_image(image_path):
# Load image
img = Image.open(image_path).convert('L') # Convert to Grayscale
# Resize to 128x128
img = img.resize((128, 128), Image.Resampling.LANCZOS)
# Convert to numpy array and normalize to 0-1
img_array = np.array(img).astype('float32') / 255.0
# Add batch dimension and channel dimension (1, 1, 128, 128)
img_array = np.expand_dims(np.expand_dims(img_array, axis=0), axis=0)
return img_array
input_data = preprocess_image("path/to/geez_char.jpg")
# 3. Run Inference
input_name = session.get_inputs()[0].name
output_name = session.get_outputs()[0].name
predictions = session.run([output_name], {input_name: input_data})[0]
# 4. Get Predicted Class
predicted_class_index = np.argmax(predictions)
print(f"Predicted Class ID: {predicted_class_index}")