Amharic/Fidel OCR Student

A compact PP-OCRv6-inspired, line-level OCR recognizer for Amharic/Fidel text.

This repository contains the deployable Stage 1 student checkpoint from a broader teacher–student Amharic OCR project. The model is designed for efficient recognition of cropped handwritten, typed, and synthetic Amharic text-line images.

Model summary

Property Value
Task Amharic text-line recognition
Language Amharic
Script Ethiopic/Fidel
Framework PyTorch
Inference parameters 24.48M
Training parameters Approximately 31.25M
Input One cropped grayscale text-line image
Input height 48 pixels
Output vocabulary 354 Fidel symbols + CTC blank
Main inference head CTC
Decoding Greedy CTC
Training-only auxiliary head NRTR
Current stage Stage 1 student baseline

Architecture

The inference pipeline is:

Cropped Amharic line image
        ↓
LCNetV4-style CNN backbone
        ↓
Height compression
        ↓
LightSVTR sequence encoder
        ↓
CTC classification head
        ↓
Recognized Fidel text

The model contains:

  • an LCNetV4-style lightweight CNN backbone;
  • MetaFormer-style visual feature-mixing blocks;
  • height compression that converts the feature map into a horizontal sequence;
  • a LightSVTR neck with local convolutional and global self-attention context;
  • a CTC recognition head for single-pass inference;
  • an auxiliary NRTR decoder used only during training.

The NRTR branch is not included in the inference checkpoint.

Stage 1 results

Official Fidel test set

Evaluation Samples Macro CER Micro CER Macro WER Micro WER Exact line accuracy
Official all test 18,011 8.24% 6.92% 19.52% 16.34% 36.99%
Official handwritten 6,035 17.41% 17.17% 45.11% 44.87% 0.91%
Official typed 5,924 4.05% 3.99% 6.47% 6.36% 68.38%
Official synthetic 6,052 3.18% 2.88% 6.78% 6.00% 42.23%

Writer-disjoint handwritten validation

Metric Result
Validation writers 41
Training writers 370
Writer overlap 0
Validation samples 3,584
Macro CER 14.71%
Micro CER 14.12%
Macro WER 39.59%
Micro WER 38.24%

The official test split supports comparison with published Fidel results. The writer-disjoint validation split measures generalization to unseen handwriting styles.

Handwritten error analysis

On the official handwritten test set, the model produced 58,222 character errors:

Error category Count Percentage
Substitutions 38,543 66.20%
Deletions 16,155 27.75%
Insertions 3,524 6.05%

Of these errors:

Category Count Percentage
Space-related errors 8,486 14.58%
Non-space character errors 49,736 85.42%

Frequent handwritten confusions include:

ሰ ↔ ስ
ላ ↔ ለ
አ ↔ እ
ደ ↔ ዳ
ባ ↔ በ
ሞ ↔ ም

The main Stage 1 limitation is fine-grained handwritten Fidel character discrimination rather than spacing alone.

Expected input

The model expects one cropped Amharic text-line image.

It does not detect text regions or process full document pages directly.

Supported input formats include image types readable by Pillow, such as:

PNG
JPEG
BMP
WEBP

Preprocessing

The included inference script applies the same main preprocessing used during training:

  1. correct EXIF orientation;
  2. convert the image to grayscale;
  3. remove excessive horizontal white margins;
  4. resize the image to a height of 48 pixels;
  5. preserve the original aspect ratio;
  6. normalize pixel values to [-1, 1];
  7. preserve the true resized width for CTC decoding.

Installation

Install the required packages:

pip install torch numpy pillow huggingface_hub

Alternatively, after downloading the repository:

pip install -r requirements.txt

Authentication for the private repository

This repository is currently private.

Authenticate before downloading:

hf auth login

The authenticated account must have access to:

Beeface/amharic-fidel-ocr-student

Usage with snapshot_download

This is the recommended Python usage for the current custom PyTorch release:

import sys

from huggingface_hub import snapshot_download

model_dir = snapshot_download(
    repo_id="Beeface/amharic-fidel-ocr-student",
    repo_type="model",
    revision="main",
)

if model_dir not in sys.path:
    sys.path.insert(0, model_dir)

from inference import recognize

prediction, device = recognize(
    image_path="my_amharic_line.png",
    model_dir=model_dir,
    device_name="auto",
)

print("Device:", device)
print("Prediction:", prediction)

device_name may be:

auto
cpu
cuda

With auto, CUDA is used when available; otherwise, the model runs on CPU.

Download with the Hugging Face CLI

Download the repository to a local folder:

hf download Beeface/amharic-fidel-ocr-student --local-dir amharic-fidel-ocr-student

Run inference:

python amharic-fidel-ocr-student/inference.py my_amharic_line.png --model-dir amharic-fidel-ocr-student --device auto

Expected output:

Device: cuda
Prediction:
<recognized Amharic text>

Use after cloning or manually downloading

When all model files are in the current directory:

python inference.py my_amharic_line.png --model-dir . --device auto

Reproducible loading

For reproducible experiments, pin the model to a tag or commit revision instead of always using main.

Example:

model_dir = snapshot_download(
    repo_id="Beeface/amharic-fidel-ocr-student",
    revision="stage1-v1.0",
)

A release tag will be added after the Stage 1 package has been independently verified on another machine.

Repository files

File Description
best_inference.pt Deployable Stage 1 CTC checkpoint
model.py Student recognizer architecture
dataset.py Vocabulary class and dataset-related utilities
vocab.json Fixed 355-class OCR vocabulary
inference.py Single-image inference interface
config.json Architecture, preprocessing, and result metadata
requirements.txt Python dependencies
README.md Model card

Training code, notebooks, evaluation scripts, and result CSV files are maintained separately in the GitHub project repository.

Intended use

The model is intended for:

  • recognition of cropped Amharic text-line images;
  • research on compact Ethiopic OCR;
  • evaluation of handwritten, typed, and synthetic Fidel recognition;
  • local CPU or GPU inference;
  • future teacher–student distillation experiments;
  • integration into a text-line OCR application or Streamlit interface.

Out-of-scope use

The model is not designed to perform:

  • full-page text detection;
  • document layout analysis;
  • reading-order reconstruction;
  • table recognition;
  • paragraph segmentation;
  • multilingual OCR outside the included vocabulary;
  • automatic correction of dataset annotation errors.

A complete full-page OCR system requires an external detector and line-segmentation pipeline before recognition.

Limitations

  • Handwritten recognition remains substantially harder than typed and synthetic recognition.
  • The current decoder uses greedy CTC without a language model.
  • The model may confuse visually similar Fidel characters.
  • Very faint, heavily degraded, rotated, or poorly cropped lines may produce weak predictions.
  • Extremely wide input images may require substantial memory.
  • Some official Fidel labels contain annotation anomalies.
  • This checkpoint has not yet undergone teacher-guided distillation.
  • The repository contains custom PyTorch code and does not currently support AutoModel.from_pretrained().

Why AutoModel.from_pretrained() is not used

This model uses a custom PyTorch architecture rather than a standard Transformers architecture.

The current supported interface is:

snapshot_download(...)

followed by:

from inference import recognize

A future release may provide a dedicated interface such as:

ocr = AmharicFidelOCR.from_pretrained(
    "Beeface/amharic-fidel-ocr-student"
)

Dataset

The model was trained and evaluated using the Fidel Amharic OCR dataset:

upanzi/fidel-dataset

The dataset contains handwritten, typed, and synthetic Amharic text-line images.

The dataset itself is not redistributed in this model repository and remains subject to its own license and usage conditions.

Training objective

Stage 1 training used:

L_stage1 = L_CTC + 0.5 × L_NRTR

The CTC head is retained for inference. The NRTR decoder was used only to provide additional sequence-level supervision during training.

Planned Stage 2

The next project stage will use a fine-tuned SuryaOCR model as a high-capacity teacher.

The intended Stage 2 objective is:

L_stage2 =
    L_CTC
    + λ_NRTR × L_NRTR
    + λ_KD × L_KD

Teacher guidance may include:

  • confidence-filtered teacher transcripts;
  • sequence-level pseudo-labels;
  • soft targets or logits where available and compatible;
  • focused guidance on difficult handwritten samples.

The inference architecture will remain unchanged:

LCNetV4-style backbone → LightSVTR → CTC

The teacher will not be required during deployment.

Research status

Stage 1 student training: completed
Official test evaluation: completed
Writer-disjoint evaluation: completed
Handwritten error analysis: completed
Stage 2 teacher evaluation: not started
Teacher prediction generation: not started
Student distillation: not started

Source code

Project source code, training scripts, notebooks, and evaluation results are maintained at:

BeefaceData/amharic-ocr-recognizer

Acknowledgements

This work uses the Fidel Amharic OCR dataset.

The student recognizer is inspired by the PP-OCRv6 recognition design, particularly lightweight convolutional visual encoding, LightSVTR-style sequence modeling, CTC inference, and auxiliary sequence supervision.

SuryaOCR is the planned high-capacity teacher for Stage 2.

License

No license has been selected for this model repository yet.

The absence of a license means that reuse rights are not automatically granted. The Fidel dataset is governed separately by its own license and terms. ````

Downloads last month
11
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Beeface/amharic-fidel-ocr-student