- Amharic/Fidel OCR Student
- Model summary
- Architecture
- Stage 1 results
- Handwritten error analysis
- Expected input
- Preprocessing
- Installation
- Authentication for the private repository
- Usage with
snapshot_download - Download with the Hugging Face CLI
- Use after cloning or manually downloading
- Reproducible loading
- Repository files
- Intended use
- Out-of-scope use
- Limitations
- Why
AutoModel.from_pretrained()is not used - Dataset
- Training objective
- Planned Stage 2
- Research status
- Source code
- Acknowledgements
- License
- Model summary
Amharic/Fidel OCR Student
A compact PP-OCRv6-inspired, line-level OCR recognizer for Amharic/Fidel text.
This repository contains the deployable Stage 1 student checkpoint from a broader teacher–student Amharic OCR project. The model is designed for efficient recognition of cropped handwritten, typed, and synthetic Amharic text-line images.
Model summary
| Property | Value |
|---|---|
| Task | Amharic text-line recognition |
| Language | Amharic |
| Script | Ethiopic/Fidel |
| Framework | PyTorch |
| Inference parameters | 24.48M |
| Training parameters | Approximately 31.25M |
| Input | One cropped grayscale text-line image |
| Input height | 48 pixels |
| Output vocabulary | 354 Fidel symbols + CTC blank |
| Main inference head | CTC |
| Decoding | Greedy CTC |
| Training-only auxiliary head | NRTR |
| Current stage | Stage 1 student baseline |
Architecture
The inference pipeline is:
Cropped Amharic line image
↓
LCNetV4-style CNN backbone
↓
Height compression
↓
LightSVTR sequence encoder
↓
CTC classification head
↓
Recognized Fidel text
The model contains:
- an LCNetV4-style lightweight CNN backbone;
- MetaFormer-style visual feature-mixing blocks;
- height compression that converts the feature map into a horizontal sequence;
- a LightSVTR neck with local convolutional and global self-attention context;
- a CTC recognition head for single-pass inference;
- an auxiliary NRTR decoder used only during training.
The NRTR branch is not included in the inference checkpoint.
Stage 1 results
Official Fidel test set
| Evaluation | Samples | Macro CER | Micro CER | Macro WER | Micro WER | Exact line accuracy |
|---|---|---|---|---|---|---|
| Official all test | 18,011 | 8.24% | 6.92% | 19.52% | 16.34% | 36.99% |
| Official handwritten | 6,035 | 17.41% | 17.17% | 45.11% | 44.87% | 0.91% |
| Official typed | 5,924 | 4.05% | 3.99% | 6.47% | 6.36% | 68.38% |
| Official synthetic | 6,052 | 3.18% | 2.88% | 6.78% | 6.00% | 42.23% |
Writer-disjoint handwritten validation
| Metric | Result |
|---|---|
| Validation writers | 41 |
| Training writers | 370 |
| Writer overlap | 0 |
| Validation samples | 3,584 |
| Macro CER | 14.71% |
| Micro CER | 14.12% |
| Macro WER | 39.59% |
| Micro WER | 38.24% |
The official test split supports comparison with published Fidel results. The writer-disjoint validation split measures generalization to unseen handwriting styles.
Handwritten error analysis
On the official handwritten test set, the model produced 58,222 character errors:
| Error category | Count | Percentage |
|---|---|---|
| Substitutions | 38,543 | 66.20% |
| Deletions | 16,155 | 27.75% |
| Insertions | 3,524 | 6.05% |
Of these errors:
| Category | Count | Percentage |
|---|---|---|
| Space-related errors | 8,486 | 14.58% |
| Non-space character errors | 49,736 | 85.42% |
Frequent handwritten confusions include:
ሰ ↔ ስ
ላ ↔ ለ
አ ↔ እ
ደ ↔ ዳ
ባ ↔ በ
ሞ ↔ ም
The main Stage 1 limitation is fine-grained handwritten Fidel character discrimination rather than spacing alone.
Expected input
The model expects one cropped Amharic text-line image.
It does not detect text regions or process full document pages directly.
Supported input formats include image types readable by Pillow, such as:
PNG
JPEG
BMP
WEBP
Preprocessing
The included inference script applies the same main preprocessing used during training:
- correct EXIF orientation;
- convert the image to grayscale;
- remove excessive horizontal white margins;
- resize the image to a height of 48 pixels;
- preserve the original aspect ratio;
- normalize pixel values to
[-1, 1]; - preserve the true resized width for CTC decoding.
Installation
Install the required packages:
pip install torch numpy pillow huggingface_hub
Alternatively, after downloading the repository:
pip install -r requirements.txt
Authentication for the private repository
This repository is currently private.
Authenticate before downloading:
hf auth login
The authenticated account must have access to:
Beeface/amharic-fidel-ocr-student
Usage with snapshot_download
This is the recommended Python usage for the current custom PyTorch release:
import sys
from huggingface_hub import snapshot_download
model_dir = snapshot_download(
repo_id="Beeface/amharic-fidel-ocr-student",
repo_type="model",
revision="main",
)
if model_dir not in sys.path:
sys.path.insert(0, model_dir)
from inference import recognize
prediction, device = recognize(
image_path="my_amharic_line.png",
model_dir=model_dir,
device_name="auto",
)
print("Device:", device)
print("Prediction:", prediction)
device_name may be:
auto
cpu
cuda
With auto, CUDA is used when available; otherwise, the model runs on CPU.
Download with the Hugging Face CLI
Download the repository to a local folder:
hf download Beeface/amharic-fidel-ocr-student --local-dir amharic-fidel-ocr-student
Run inference:
python amharic-fidel-ocr-student/inference.py my_amharic_line.png --model-dir amharic-fidel-ocr-student --device auto
Expected output:
Device: cuda
Prediction:
<recognized Amharic text>
Use after cloning or manually downloading
When all model files are in the current directory:
python inference.py my_amharic_line.png --model-dir . --device auto
Reproducible loading
For reproducible experiments, pin the model to a tag or commit revision instead of always using main.
Example:
model_dir = snapshot_download(
repo_id="Beeface/amharic-fidel-ocr-student",
revision="stage1-v1.0",
)
A release tag will be added after the Stage 1 package has been independently verified on another machine.
Repository files
| File | Description |
|---|---|
best_inference.pt |
Deployable Stage 1 CTC checkpoint |
model.py |
Student recognizer architecture |
dataset.py |
Vocabulary class and dataset-related utilities |
vocab.json |
Fixed 355-class OCR vocabulary |
inference.py |
Single-image inference interface |
config.json |
Architecture, preprocessing, and result metadata |
requirements.txt |
Python dependencies |
README.md |
Model card |
Training code, notebooks, evaluation scripts, and result CSV files are maintained separately in the GitHub project repository.
Intended use
The model is intended for:
- recognition of cropped Amharic text-line images;
- research on compact Ethiopic OCR;
- evaluation of handwritten, typed, and synthetic Fidel recognition;
- local CPU or GPU inference;
- future teacher–student distillation experiments;
- integration into a text-line OCR application or Streamlit interface.
Out-of-scope use
The model is not designed to perform:
- full-page text detection;
- document layout analysis;
- reading-order reconstruction;
- table recognition;
- paragraph segmentation;
- multilingual OCR outside the included vocabulary;
- automatic correction of dataset annotation errors.
A complete full-page OCR system requires an external detector and line-segmentation pipeline before recognition.
Limitations
- Handwritten recognition remains substantially harder than typed and synthetic recognition.
- The current decoder uses greedy CTC without a language model.
- The model may confuse visually similar Fidel characters.
- Very faint, heavily degraded, rotated, or poorly cropped lines may produce weak predictions.
- Extremely wide input images may require substantial memory.
- Some official Fidel labels contain annotation anomalies.
- This checkpoint has not yet undergone teacher-guided distillation.
- The repository contains custom PyTorch code and does not currently support
AutoModel.from_pretrained().
Why AutoModel.from_pretrained() is not used
This model uses a custom PyTorch architecture rather than a standard Transformers architecture.
The current supported interface is:
snapshot_download(...)
followed by:
from inference import recognize
A future release may provide a dedicated interface such as:
ocr = AmharicFidelOCR.from_pretrained(
"Beeface/amharic-fidel-ocr-student"
)
Dataset
The model was trained and evaluated using the Fidel Amharic OCR dataset:
upanzi/fidel-dataset
The dataset contains handwritten, typed, and synthetic Amharic text-line images.
The dataset itself is not redistributed in this model repository and remains subject to its own license and usage conditions.
Training objective
Stage 1 training used:
L_stage1 = L_CTC + 0.5 × L_NRTR
The CTC head is retained for inference. The NRTR decoder was used only to provide additional sequence-level supervision during training.
Planned Stage 2
The next project stage will use a fine-tuned SuryaOCR model as a high-capacity teacher.
The intended Stage 2 objective is:
L_stage2 =
L_CTC
+ λ_NRTR × L_NRTR
+ λ_KD × L_KD
Teacher guidance may include:
- confidence-filtered teacher transcripts;
- sequence-level pseudo-labels;
- soft targets or logits where available and compatible;
- focused guidance on difficult handwritten samples.
The inference architecture will remain unchanged:
LCNetV4-style backbone → LightSVTR → CTC
The teacher will not be required during deployment.
Research status
Stage 1 student training: completed
Official test evaluation: completed
Writer-disjoint evaluation: completed
Handwritten error analysis: completed
Stage 2 teacher evaluation: not started
Teacher prediction generation: not started
Student distillation: not started
Source code
Project source code, training scripts, notebooks, and evaluation results are maintained at:
BeefaceData/amharic-ocr-recognizer
Acknowledgements
This work uses the Fidel Amharic OCR dataset.
The student recognizer is inspired by the PP-OCRv6 recognition design, particularly lightweight convolutional visual encoding, LightSVTR-style sequence modeling, CTC inference, and auxiliary sequence supervision.
SuryaOCR is the planned high-capacity teacher for Stage 2.
License
No license has been selected for this model repository yet.
The absence of a license means that reuse rights are not automatically granted. The Fidel dataset is governed separately by its own license and terms. ````
- Downloads last month
- 11