Amharic/Fidel OCR Student

A compact PP-OCRv6-inspired, line-level OCR recognizer for Amharic/Fidel text.

This repository contains the deployable Stage 1 student checkpoint from a broader teacher–student Amharic OCR project. The model is designed for efficient recognition of cropped handwritten, typed, and synthetic Amharic text-line images.

Model summary

Property	Value
Task	Amharic text-line recognition
Language	Amharic
Script	Ethiopic/Fidel
Framework	PyTorch
Inference parameters	24.48M
Training parameters	Approximately 31.25M
Input	One cropped grayscale text-line image
Input height	48 pixels
Output vocabulary	354 Fidel symbols + CTC blank
Main inference head	CTC
Decoding	Greedy CTC
Training-only auxiliary head	NRTR
Current stage	Stage 1 student baseline

Architecture

The inference pipeline is:

Cropped Amharic line image
        ↓
LCNetV4-style CNN backbone
        ↓
Height compression
        ↓
LightSVTR sequence encoder
        ↓
CTC classification head
        ↓
Recognized Fidel text

The model contains:

an LCNetV4-style lightweight CNN backbone;
MetaFormer-style visual feature-mixing blocks;
height compression that converts the feature map into a horizontal sequence;
a LightSVTR neck with local convolutional and global self-attention context;
a CTC recognition head for single-pass inference;
an auxiliary NRTR decoder used only during training.

The NRTR branch is not included in the inference checkpoint.

Stage 1 results

Official Fidel test set

Evaluation	Samples	Macro CER	Micro CER	Macro WER	Micro WER	Exact line accuracy
Official all test	18,011	8.24%	6.92%	19.52%	16.34%	36.99%
Official handwritten	6,035	17.41%	17.17%	45.11%	44.87%	0.91%
Official typed	5,924	4.05%	3.99%	6.47%	6.36%	68.38%
Official synthetic	6,052	3.18%	2.88%	6.78%	6.00%	42.23%

Writer-disjoint handwritten validation

Metric	Result
Validation writers	41
Training writers	370
Writer overlap	0
Validation samples	3,584
Macro CER	14.71%
Micro CER	14.12%
Macro WER	39.59%
Micro WER	38.24%

The official test split supports comparison with published Fidel results. The writer-disjoint validation split measures generalization to unseen handwriting styles.

Handwritten error analysis

On the official handwritten test set, the model produced 58,222 character errors:

Error category	Count	Percentage
Substitutions	38,543	66.20%
Deletions	16,155	27.75%
Insertions	3,524	6.05%

Of these errors:

Category	Count	Percentage
Space-related errors	8,486	14.58%
Non-space character errors	49,736	85.42%

Frequent handwritten confusions include:

ሰ ↔ ስ
ላ ↔ ለ
አ ↔ እ
ደ ↔ ዳ
ባ ↔ በ
ሞ ↔ ም

The main Stage 1 limitation is fine-grained handwritten Fidel character discrimination rather than spacing alone.

Expected input

The model expects one cropped Amharic text-line image.

It does not detect text regions or process full document pages directly.

Supported input formats include image types readable by Pillow, such as:

PNG
JPEG
BMP
WEBP

Preprocessing

The included inference script applies the same main preprocessing used during training:

correct EXIF orientation;
convert the image to grayscale;
remove excessive horizontal white margins;
resize the image to a height of 48 pixels;
preserve the original aspect ratio;
normalize pixel values to [-1, 1];
preserve the true resized width for CTC decoding.

Installation

Install the required packages:

pip install torch numpy pillow huggingface_hub

Alternatively, after downloading the repository:

pip install -r requirements.txt

Authentication for the private repository

This repository is currently private.

Authenticate before downloading:

hf auth login

The authenticated account must have access to:

Beeface/amharic-fidel-ocr-student

Usage with `snapshot_download`

This is the recommended Python usage for the current custom PyTorch release:

import sys

from huggingface_hub import snapshot_download

model_dir = snapshot_download(
    repo_id="Beeface/amharic-fidel-ocr-student",
    repo_type="model",
    revision="main",
)

if model_dir not in sys.path:
    sys.path.insert(0, model_dir)

from inference import recognize

prediction, device = recognize(
    image_path="my_amharic_line.png",
    model_dir=model_dir,
    device_name="auto",
)

print("Device:", device)
print("Prediction:", prediction)

device_name may be:

auto
cpu
cuda

With auto, CUDA is used when available; otherwise, the model runs on CPU.

Download with the Hugging Face CLI

Download the repository to a local folder:

hf download Beeface/amharic-fidel-ocr-student --local-dir amharic-fidel-ocr-student

Run inference:

python amharic-fidel-ocr-student/inference.py my_amharic_line.png --model-dir amharic-fidel-ocr-student --device auto

Expected output:

Device: cuda
Prediction:
<recognized Amharic text>

Use after cloning or manually downloading

When all model files are in the current directory:

python inference.py my_amharic_line.png --model-dir . --device auto

Reproducible loading

For reproducible experiments, pin the model to a tag or commit revision instead of always using main.

Example:

model_dir = snapshot_download(
    repo_id="Beeface/amharic-fidel-ocr-student",
    revision="stage1-v1.0",
)

A release tag will be added after the Stage 1 package has been independently verified on another machine.

Repository files

File	Description
`best_inference.pt`	Deployable Stage 1 CTC checkpoint
`model.py`	Student recognizer architecture
`dataset.py`	Vocabulary class and dataset-related utilities
`vocab.json`	Fixed 355-class OCR vocabulary
`inference.py`	Single-image inference interface
`config.json`	Architecture, preprocessing, and result metadata
`requirements.txt`	Python dependencies
`README.md`	Model card

Training code, notebooks, evaluation scripts, and result CSV files are maintained separately in the GitHub project repository.

Intended use

The model is intended for:

recognition of cropped Amharic text-line images;
research on compact Ethiopic OCR;
evaluation of handwritten, typed, and synthetic Fidel recognition;
local CPU or GPU inference;
future teacher–student distillation experiments;
integration into a text-line OCR application or Streamlit interface.

Out-of-scope use

The model is not designed to perform:

full-page text detection;
document layout analysis;
reading-order reconstruction;
table recognition;
paragraph segmentation;
multilingual OCR outside the included vocabulary;
automatic correction of dataset annotation errors.

A complete full-page OCR system requires an external detector and line-segmentation pipeline before recognition.

Limitations

Handwritten recognition remains substantially harder than typed and synthetic recognition.
The current decoder uses greedy CTC without a language model.
The model may confuse visually similar Fidel characters.
Very faint, heavily degraded, rotated, or poorly cropped lines may produce weak predictions.
Extremely wide input images may require substantial memory.
Some official Fidel labels contain annotation anomalies.
This checkpoint has not yet undergone teacher-guided distillation.
The repository contains custom PyTorch code and does not currently support AutoModel.from_pretrained().

Why `AutoModel.from_pretrained()` is not used

This model uses a custom PyTorch architecture rather than a standard Transformers architecture.

The current supported interface is:

snapshot_download(...)

followed by:

from inference import recognize

A future release may provide a dedicated interface such as:

ocr = AmharicFidelOCR.from_pretrained(
    "Beeface/amharic-fidel-ocr-student"
)

Dataset

The model was trained and evaluated using the Fidel Amharic OCR dataset:

upanzi/fidel-dataset

The dataset contains handwritten, typed, and synthetic Amharic text-line images.

The dataset itself is not redistributed in this model repository and remains subject to its own license and usage conditions.

Training objective

Stage 1 training used:

L_stage1 = L_CTC + 0.5 × L_NRTR

The CTC head is retained for inference. The NRTR decoder was used only to provide additional sequence-level supervision during training.

Planned Stage 2

The next project stage will use a fine-tuned SuryaOCR model as a high-capacity teacher.

The intended Stage 2 objective is:

L_stage2 =
    L_CTC
    + λ_NRTR × L_NRTR
    + λ_KD × L_KD

Teacher guidance may include:

confidence-filtered teacher transcripts;
sequence-level pseudo-labels;
soft targets or logits where available and compatible;
focused guidance on difficult handwritten samples.

The inference architecture will remain unchanged:

LCNetV4-style backbone → LightSVTR → CTC

The teacher will not be required during deployment.

Research status

Stage 1 student training: completed
Official test evaluation: completed
Writer-disjoint evaluation: completed
Handwritten error analysis: completed
Stage 2 teacher evaluation: not started
Teacher prediction generation: not started
Student distillation: not started

Source code

Project source code, training scripts, notebooks, and evaluation results are maintained at:

BeefaceData/amharic-ocr-recognizer

Acknowledgements

This work uses the Fidel Amharic OCR dataset.

The student recognizer is inspired by the PP-OCRv6 recognition design, particularly lightweight convolutional visual encoding, LightSVTR-style sequence modeling, CTC inference, and auxiliary sequence supervision.

SuryaOCR is the planned high-capacity teacher for Stage 2.

License

No license has been selected for this model repository yet.

The absence of a license means that reuse rights are not automatically granted. The Fidel dataset is governed separately by its own license and terms. ````

Downloads last month: 11

Beeface
/

amharic-fidel-ocr-student

Amharic/Fidel OCR Student

Model summary

Architecture

Stage 1 results

Official Fidel test set

Writer-disjoint handwritten validation

Handwritten error analysis

Expected input

Preprocessing

Installation

Authentication for the private repository

Usage with `snapshot_download`

Download with the Hugging Face CLI

Use after cloning or manually downloading

Reproducible loading

Repository files

Intended use

Out-of-scope use

Limitations

Why `AutoModel.from_pretrained()` is not used

Dataset

Training objective

Planned Stage 2

Research status

Source code

Acknowledgements

License

Dataset used to train Beeface/amharic-fidel-ocr-student

Amharic/Fidel OCR Student

Model summary

Architecture

Stage 1 results

Official Fidel test set

Writer-disjoint handwritten validation

Handwritten error analysis

Expected input

Preprocessing

Installation

Authentication for the private repository

Usage with snapshot_download

Download with the Hugging Face CLI

Use after cloning or manually downloading

Reproducible loading

Repository files

Intended use

Out-of-scope use

Limitations

Why AutoModel.from_pretrained() is not used

Dataset

Training objective

Planned Stage 2

Research status

Source code

Acknowledgements

License

Dataset used to train Beeface/amharic-fidel-ocr-student

Usage with `snapshot_download`

Why `AutoModel.from_pretrained()` is not used