Update README.md

58df1c9 verified about 2 months ago

5 kB

	---
	license: apache-2.0
	datasets:
	- Yaredoffice/geez-characters
	language:
	- am
	- ti
	pipeline_tag: image-classification
	library_name: keras
	tags:
	- geez
	- characters
	- ocr
	- geez ocr
	- amharic
	- tigrinya
	- onnx
	---

	# Geez Character OCR (Geez-Net)

	<!-- Provide a quick summary of what the model is/does. -->
	This model is a high-performance Optical Character Recognition (OCR) system specifically designed for the Geez script (Amharic, Tigrinya). It utilizes a Convolutional Neural Network (CNN) architecture to classify individual handwritten Geez characters from images with high accuracy.

	## Model Details

	### Model Description

	<!-- Provide a longer summary of what this model is. -->

	This model addresses the challenge of digital recognition for the Geez script by utilizing a deep CNN architecture. It is trained to accept a single character image and output one of 287 possible character classes. It has been optimized for web deployment using ONNX runtime.

	- Developed by: Yared Kassa
	- Shared by: Yared Kassa
	- Model type: Convolutional Neural Network (CNN) for Image Classification
	- Language(s): Amharic, Tigrinya (Geez Script)
	- License: apache-2.0
	- Finetuned from model: Trained from scratch

	### Model Sources [optional]

	<!-- Provide a basic links for the model. -->

	- Repository: [Yaredoffice/geez-characters](https://huggingface.co/Yaredoffice/geez-characters)

	## Uses

	<!-- Address questions around how the model is intended to be used, including foreseeable users of the model and those affected by the model. -->

	### Direct Use

	<!-- This section is for model use without fine-tuning or plugging into a larger ecosystem/app. -->

	The model is intended for direct use in digitizing handwritten Geez documents, educational language learning tools, and automated data entry systems. Users input a cropped image of a handwritten character, and the model returns the predicted character class and confidence score.


	### Downstream Use [optional]

	<!-- This section is for model use when fine-tuned for a task, or when plugged into a larger ecosystem/app. -->

	N/A (This is a standalone classification model).

	### Out-of-Scope Use

	<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->

	The model is not designed for:
	- Full document OCR (it does not perform word segmentation or layout analysis).
	- Recognition of non-Geez scripts (Latin, Arabic, etc.).
	- Recognition of cursive or heavily stylized fonts not present in the training data.

	## Bias, Risks, and Limitations

	<!-- This section is meant to convey both technical and sociotechnical limitations. -->

	### Limitations

	1. Single Character Input: The model requires pre-segmented single-character images. It cannot process whole words or sentences directly.
	2. Input Quality: Performance may degrade on low-resolution or highly noisy images without pre-processing.
	3. Data Bias: While trained on ~400k augmented images, the model may be biased towards the specific handwriting styles present in the original 13k source dataset.

	### Recommendations

	<!-- This section is meant to convey recommendations with respect to bias, risk, and technical limitations. -->

	Users should implement a pre-processing pipeline to segment words into individual characters before feeding them into this model. Images should be normalized to 128x128 pixels and converted to grayscale.

	## Evaluation

	### Testing Data, Factors & Metrics

	#### Metrics
	- Accuracy: The primary metric used for evaluation.

	#### Inference Performance
	- Single Image Inference: 81% baseline accuracy.
	- Test-Time Augmentation (TTA):
	- Configuration: 10 augmentations with majority voting.
	- Result: Achieves approximately 90% classification accuracy.
	- Impact: Significantly reduces error rates caused by handwriting variability.

	## How to Get Started with the Model

	Use the code below to get started with the model.

	```python
	import onnxruntime as ort
	import numpy as np
	from PIL import Image

	# 1. Load the ONNX model
	session = ort.InferenceSession("cnn_output.onnx")

	# 2. Preprocess input image
	def preprocess_image(image_path):
	# Load image
	img = Image.open(image_path).convert('L') # Convert to Grayscale
	# Resize to 128x128
	img = img.resize((128, 128), Image.Resampling.LANCZOS)
	# Convert to numpy array and normalize to 0-1
	img_array = np.array(img).astype('float32') / 255.0
	# Add batch dimension and channel dimension (1, 1, 128, 128)
	img_array = np.expand_dims(np.expand_dims(img_array, axis=0), axis=0)
	return img_array

	input_data = preprocess_image("path/to/geez_char.jpg")

	# 3. Run Inference
	input_name = session.get_inputs()[0].name
	output_name = session.get_outputs()[0].name
	predictions = session.run([output_name], {input_name: input_data})[0]

	# 4. Get Predicted Class
	predicted_class_index = np.argmax(predictions)
	print(f"Predicted Class ID: {predicted_class_index}")