ocr-build001 / README copy.md
dev-models's picture
initial commit
0d9f6c2

End-to-End OCR with CRNN, CTC, and FastAPI

This project provides a complete pipeline for Optical Character Recognition (OCR). It features a Convolutional Recurrent Neural Network (CRNN) trained with Connectionist Temporal Classification (CTC) loss for text recognition. The trained model is served via a FastAPI application, which uses the CRAFT model for initial text detection.

The entire workflow, from data preparation and model training to deployment as a web service, is documented and implemented in this repository.

Table of Contents

Workflow Overview

The project follows a clear, step-by-step process from model creation to deployment:

  1. Train the Model: The notebook/ocr.ipynb notebook is used to train the CRNN text recognition model on the MJSynth dataset.
  2. Export the Model: The trained PyTorch model is exported to the ONNX format (model.onnx) for efficient inference.
  3. Serve the Model: The FastAPI application (main/app.py) loads the ONNX model and the CRAFT text detection model to provide OCR capabilities through a web API.
  4. Deploy: The entire application is containerized using Docker for easy and reproducible deployment.

Features

  • Two-Stage OCR: Uses CRAFT for accurate text detection and a CRNN for robust text recognition.
  • Deep Learning Model: An optimized CRNN architecture implemented in PyTorch.
  • Efficient Inference: Model is exported to ONNX for fast performance.
  • Web API: A FastAPI server with both REST and WebSocket endpoints.
  • Reproducible Environment: Comes with a Dockerfile for easy setup and deployment.
  • Complete Workflow: Includes all steps from training to deployment.

Project Structure

.
β”œβ”€β”€ .gitignore           # Files to be ignored by Git
β”œβ”€β”€ Dockerfile           # Docker configuration for the API
β”œβ”€β”€ main
β”‚   β”œβ”€β”€ app.py           # Main FastAPI application
β”‚   └── core             # Modularized application logic
β”‚       β”œβ”€β”€ __init__.py
β”‚       β”œβ”€β”€ config.py    # Configuration variables
β”‚       β”œβ”€β”€ models.py    # Model loading (CRAFT)
β”‚       β”œβ”€β”€ ocr.py       # Core OCR pipeline
β”‚       └── utils.py     # Utility functions
β”œβ”€β”€ models
β”‚   β”œβ”€β”€ final.pth        # Final trained PyTorch model weights
β”‚   └── model.onnx       # Trained model in ONNX format
β”œβ”€β”€ notebook
β”‚   └── ocr.ipynb        # Jupyter Notebook for training and export
└── requirements.txt     # Python dependencies

Technical Details

Text Detection: CRAFT

The FastAPI application first uses the CRAFT (Character-Region Awareness for Text) model to detect text regions in the input image. It identifies bounding boxes around words or lines of text. This project uses the hezarai/CRAFT implementation.

Text Recognition: CRNN

For each bounding box detected by CRAFT, a Convolutional Recurrent Neural Network (CRNN) is used to recognize the text within that region.

  • Convolutional Layers (CNN): Serve as a powerful feature extractor, processing the image patch and outputting a sequence of feature vectors.
  • Recurrent Layers (RNN): A bidirectional LSTM network processes the feature sequence, capturing contextual dependencies between characters.
  • CTC Loss: The model is trained with Connectionist Temporal Classification (CTC) loss, which eliminates the need for character-level alignment between the input image and the output text, making it perfect for OCR.

Getting Started

Prerequisites

  • Python 3.9 or higher
  • An NVIDIA GPU with CUDA for training is highly recommended.
  • Docker for containerized deployment.

Installation

  1. Clone the repository:

    git clone <repository-url>
    cd crnn-ctc-ocr
    
  2. Set up a virtual environment (recommended):

    python -m venv venv
    source venv/bin/activate  # On Windows use `venv\Scripts\activate`
    
  3. Install dependencies:

    pip install -r requirements.txt
    

Model Training and Export

The notebook/ocr.ipynb notebook contains the complete code for training and exporting the model.

Dataset

The model is trained on the MJSynth (MJ Synth) dataset, a large-scale synthetic dataset for text recognition. The notebook automatically downloads it using the datasets library from Hugging Face (priyank-m/MJSynth_text_recognition).

Running the Training Notebook

  1. Launch Jupyter:
    jupyter notebook
    
  2. Open notebook/ocr.ipynb.
  3. You can run all cells to execute the full pipeline. The DEMO flag is set to True by default to train on a smaller subset for a quick run. Set it to False for full training.
  4. The notebook will:
    • Load and preprocess the dataset.
    • Define the CRNN model, loss function, and optimizer.
    • Run the training loop, showing progress and validation metrics (Character Error Rate, Word Error Rate).
    • Save the best model to checkpoints/best.pth and the final model to checkpoints/final.pth.

Exporting to ONNX

After training, the last cells of the notebook handle the export to ONNX.

  • It takes the trained CRNN model.
  • It exports the model to export/model.onnx. This file is crucial for the FastAPI application. The repository already includes a pre-trained models/model.onnx.

Inference

Notebook Inference

The notebook includes helper functions to test the model directly.

1. Using the PyTorch model (.pth):

# predict with the best saved PyTorch model
predict('path/to/your/image.png', model_path='checkpoints/best.pth')

2. Using the ONNX model:

# predict with the exported ONNX model
predict_onnx('path/to/your/image.png', onnx_path='export/model.onnx')

API Server (FastAPI)

The application provides a web server to perform OCR on uploaded images.

Running the Server

To run the API server locally:

python main/app.py

The server will start on http://localhost:8000.

API Endpoints

HTTP POST Request

  • Endpoint: POST /predict/image
  • Description: Upload an image and receive the OCR results in JSON format.
  • Example using curl:
    curl -X POST -F "file=@/path/to/your/image.jpg" http://localhost:8000/predict/image
    
  • Response: A JSON object containing the detected paragraph, lines, words, and a base64-encoded image with bounding boxes drawn on it.

WebSocket Connection

  • Endpoint: ws://localhost:8000/ws/predict
  • Description: A WebSocket endpoint for real-time OCR. Send an image as bytes, and the server will return the OCR result as a JSON message. This is useful for streaming or interactive applications.

Deployment with Docker

The project includes a Dockerfile to easily containerize and deploy the application.

  1. Prerequisite: Ensure the models/model.onnx file exists. If you've trained your own model, make sure your exported export/model.onnx is moved to models/model.onnx or the path in main/app.py is updated. (Note: The provided code in main/app.py looks for export/model.onnx, so ensure this path is correct for your setup or that the file is present at build time.)

  2. Build the Docker image:

    docker build -t ocr-api .
    
  3. Run the Docker container:

    docker run -p 8000:8000 ocr-api
    

    The application will be running and accessible at http://localhost:8000.