Gujarati OCR Model (SVTR-LCNet)

This model performs Optical Character Recognition (OCR) for Gujarati text using SVTR (Scene Text Recognition with a Single Visual model) architecture with MultiHead (CTC + SAR).

Model Description

  • Architecture: SVTR-LCNet with MultiHead (CTC + SAR heads)
  • Framework: PaddleOCR
  • Language: Gujarati (gu)
  • Input Size: [3, 48, 384] (C, H, W)
  • Output: Gujarati text sequence

Training Details

Training Data

  • Dataset: Gujarati text images
  • Characters: 1030 unique Gujarati characters including:
    • Gujarati consonants (ક, ખ, ગ, ઘ, etc.)
    • Vowels (અ, આ, ઇ, ઈ, etc.)
    • Matras (diacritical marks)
    • Gujarati numerals (૦-૯)
    • Special characters

Training Configuration

  • Epochs: 120
  • Best Epoch: 120
  • Training Accuracy: 88.8%
  • Norm Edit Distance: 0.977
  • Optimizer: Adam with learning rate scheduling
  • Image Shape: [3, 48, 384]

Training Results

Metric Value
Final Accuracy 0.888
Norm Edit Distance 0.977
Best Epoch 120
FPS (eval) 1248.98

Usage

Prerequisites

pip install paddlepaddle-gpu paddleocr opencv-python numpy

Basic Usage

import paddle.inference as paddle_infer
import cv2
import numpy as np
import math

# Load model files (download from this repo)
config = paddle_infer.Config("inference.json", "inference.pdiparams")
config.enable_use_gpu(100, 0)
predictor = paddle_infer.create_predictor(config)

# Load character dictionary
with open(gu_dict.txt, r, encoding=utf-8) as f:
    chars = [line.rstrip(
) for line in f.readlines() if line.strip(
)]
char_list = ["<blank>"] + chars + [" "]

# Preprocessing function (matches PaddleOCR training)
def preprocess(img):
    imgC, imgH, imgW = 3, 48, 384
    h, w = img.shape[:2]
    ratio = w / float(h)
    
    if math.ceil(imgH * ratio) > imgW:
        resized_w = imgW
    else:
        resized_w = int(math.ceil(imgH * ratio))
    
    resized_image = cv2.resize(img, (resized_w, imgH))
    resized_image = resized_image.astype(float32)
    
    # Normalize: (pixel / 255 - 0.5) / 0.5
    resized_image = resized_image.transpose((2, 0, 1)) / 255.0
    resized_image -= 0.5
    resized_image /= 0.5
    
    # Pad to fixed width
    padding_im = np.zeros((imgC, imgH, imgW), dtype=np.float32)
    padding_im[:, :, 0:resized_w] = resized_image
    
    return np.expand_dims(padding_im, axis=0)

# CTC Decoding
def ctc_decode(indices, char_list):
    return "".join([char_list[idx] for i, idx in enumerate(indices) 
                   if idx != 0 and (i == 0 or idx != indices[i-1]) and idx < len(char_list)])

# Run inference
img = cv2.imread(your_gujarati_text.jpg)
input_tensor = preprocess(img)

input_handle = predictor.get_input_handle(x)
input_handle.copy_from_cpu(input_tensor)
predictor.run()

logits = predictor.get_output_handle(fetch_name_0).copy_to_cpu()
indices = np.argmax(logits, axis=2)[0]

# Decode text
gujarati_text = ctc_decode(indices, char_list)
print(gujarati_text)

With Text Detection (Full OCR Pipeline)

from paddlex import create_model

# Load detection model
det_model = create_model("PP-OCRv5_server_det")

# Detect text regions
img = cv2.imread(document.jpg)
det_result = list(det_model.predict(img))[0]
boxes = det_result.get(dt_polys, [])

# Process each detected region
for box in boxes:
    # Crop text region
    # ... (crop and rotate box)
    
    # Recognize text
    input_tensor = preprocess(cropped_region)
    # ... (run inference as shown above)

Model Performance

Strengths

  • High accuracy (88.8%) on single-word Gujarati text
  • Fast inference speed (1248 FPS on GPU)
  • Supports all Gujarati characters and diacritics
  • Works well with clean, printed text

Limitations

  • Best for single-word text: Trained on individual words, not full sentences
  • Printed text only: May not work well on handwritten text
  • Similar to training data: Performance degrades on significantly different image styles
  • Clean images: Works best on high-contrast, clear text images

Use Cases

Good for:

  • Digitizing printed Gujarati documents
  • Single-word Gujarati text recognition
  • Clean text images with good contrast
  • Gujarati text in books, signs, labels

Not recommended for:

  • Handwritten Gujarati text
  • Low-quality or blurry images
  • Complex document layouts without proper text detection
  • Significantly different fonts/styles from training data

Example Results

Input: Clean Gujarati word image
Output: ફાઇબર (fiber)
Accuracy: ✅ Perfect match

Technical Details

Architecture

  • Backbone: LCNet (Lightweight CNN)
  • Neck: SVTR (Scene Text Recognition Transformer)
  • Head: MultiHead (CTC + SAR)
    • CTC: Connectionist Temporal Classification
    • SAR: Show, Attend and Read

Input/Output

  • Input Shape: (batch_size, 3, 48, 384)
  • Input Range: [-1.0, 1.0] (normalized)
  • Output Shape: (batch_size, 48, 1032)
  • Output: Logits for 1032 classes (blank + 1030 chars + space)

Citation

If you use this model, please cite PaddleOCR:

@misc{paddleocr,
    title={PaddleOCR: Awesome multilingual OCR toolkits},
    author={PaddlePaddle Authors},
    howpublished = {\url{https://github.com/PaddlePaddle/PaddleOCR}},
    year={2020}
}

License

Apache License 2.0

Contact & Support

For issues or questions:

Acknowledgments

This model was trained using the PaddleOCR framework. Special thanks to the PaddlePaddle team for their excellent OCR toolkit.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support