Spaces:

dev-models
/

ocr-build001

Sleeping

App Files Files Community

ocr-build001 / README copy.md

dev-models

initial commit

0d9f6c2 about 1 month ago

preview code

raw

history blame contribute delete

8.51 kB

End-to-End OCR with CRNN, CTC, and FastAPI

This project provides a complete pipeline for Optical Character Recognition (OCR). It features a Convolutional Recurrent Neural Network (CRNN) trained with Connectionist Temporal Classification (CTC) loss for text recognition. The trained model is served via a FastAPI application, which uses the CRAFT model for initial text detection.

The entire workflow, from data preparation and model training to deployment as a web service, is documented and implemented in this repository.

Workflow Overview
Features
Project Structure
Technical Details
- Text Detection: CRAFT
- Text Recognition: CRNN
Getting Started
- Prerequisites
- Installation
Model Training and Export
Inference
- Notebook Inference
API Server (FastAPI)
- Running the Server
- API Endpoints
  - HTTP POST Request
  - WebSocket Connection
Deployment with Docker

Workflow Overview

The project follows a clear, step-by-step process from model creation to deployment:

Train the Model: The notebook/ocr.ipynb notebook is used to train the CRNN text recognition model on the MJSynth dataset.
Export the Model: The trained PyTorch model is exported to the ONNX format (model.onnx) for efficient inference.
Serve the Model: The FastAPI application (main/app.py) loads the ONNX model and the CRAFT text detection model to provide OCR capabilities through a web API.
Deploy: The entire application is containerized using Docker for easy and reproducible deployment.

Features

Two-Stage OCR: Uses CRAFT for accurate text detection and a CRNN for robust text recognition.
Deep Learning Model: An optimized CRNN architecture implemented in PyTorch.
Efficient Inference: Model is exported to ONNX for fast performance.
Web API: A FastAPI server with both REST and WebSocket endpoints.
Reproducible Environment: Comes with a Dockerfile for easy setup and deployment.
Complete Workflow: Includes all steps from training to deployment.

Project Structure

.
├── .gitignore           # Files to be ignored by Git
├── Dockerfile           # Docker configuration for the API
├── main
│   ├── app.py           # Main FastAPI application
│   └── core             # Modularized application logic
│       ├── __init__.py
│       ├── config.py    # Configuration variables
│       ├── models.py    # Model loading (CRAFT)
│       ├── ocr.py       # Core OCR pipeline
│       └── utils.py     # Utility functions
├── models
│   ├── final.pth        # Final trained PyTorch model weights
│   └── model.onnx       # Trained model in ONNX format
├── notebook
│   └── ocr.ipynb        # Jupyter Notebook for training and export
└── requirements.txt     # Python dependencies

Technical Details

Text Detection: CRAFT

The FastAPI application first uses the CRAFT (Character-Region Awareness for Text) model to detect text regions in the input image. It identifies bounding boxes around words or lines of text. This project uses the hezarai/CRAFT implementation.

Text Recognition: CRNN

For each bounding box detected by CRAFT, a Convolutional Recurrent Neural Network (CRNN) is used to recognize the text within that region.

Convolutional Layers (CNN): Serve as a powerful feature extractor, processing the image patch and outputting a sequence of feature vectors.
Recurrent Layers (RNN): A bidirectional LSTM network processes the feature sequence, capturing contextual dependencies between characters.
CTC Loss: The model is trained with Connectionist Temporal Classification (CTC) loss, which eliminates the need for character-level alignment between the input image and the output text, making it perfect for OCR.

Getting Started

Prerequisites

Python 3.9 or higher
An NVIDIA GPU with CUDA for training is highly recommended.
Docker for containerized deployment.

Installation

Clone the repository:

git clone <repository-url>
cd crnn-ctc-ocr

Set up a virtual environment (recommended):

python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Install dependencies:
```
pip install -r requirements.txt
```

Model Training and Export

The notebook/ocr.ipynb notebook contains the complete code for training and exporting the model.

Dataset

The model is trained on the MJSynth (MJ Synth) dataset, a large-scale synthetic dataset for text recognition. The notebook automatically downloads it using the datasets library from Hugging Face (priyank-m/MJSynth_text_recognition).

Running the Training Notebook

Launch Jupyter:
```
jupyter notebook
```
Open notebook/ocr.ipynb.
You can run all cells to execute the full pipeline. The DEMO flag is set to True by default to train on a smaller subset for a quick run. Set it to False for full training.
The notebook will:
- Load and preprocess the dataset.
- Define the CRNN model, loss function, and optimizer.
- Run the training loop, showing progress and validation metrics (Character Error Rate, Word Error Rate).
- Save the best model to checkpoints/best.pth and the final model to checkpoints/final.pth.

Exporting to ONNX

After training, the last cells of the notebook handle the export to ONNX.

It takes the trained CRNN model.
It exports the model to export/model.onnx. This file is crucial for the FastAPI application. The repository already includes a pre-trained models/model.onnx.

Inference

Notebook Inference

The notebook includes helper functions to test the model directly.

1. Using the PyTorch model (.pth):

# predict with the best saved PyTorch model
predict('path/to/your/image.png', model_path='checkpoints/best.pth')

2. Using the ONNX model:

# predict with the exported ONNX model
predict_onnx('path/to/your/image.png', onnx_path='export/model.onnx')

API Server (FastAPI)

The application provides a web server to perform OCR on uploaded images.

Running the Server

To run the API server locally:

python main/app.py

The server will start on http://localhost:8000.

API Endpoints

HTTP POST Request

Endpoint: POST /predict/image
Description: Upload an image and receive the OCR results in JSON format.

Example using curl:

curl -X POST -F "file=@/path/to/your/image.jpg" http://localhost:8000/predict/image

Response: A JSON object containing the detected paragraph, lines, words, and a base64-encoded image with bounding boxes drawn on it.

WebSocket Connection

Endpoint: ws://localhost:8000/ws/predict
Description: A WebSocket endpoint for real-time OCR. Send an image as bytes, and the server will return the OCR result as a JSON message. This is useful for streaming or interactive applications.

Deployment with Docker

The project includes a Dockerfile to easily containerize and deploy the application.

Prerequisite: Ensure the models/model.onnx file exists. If you've trained your own model, make sure your exported export/model.onnx is moved to models/model.onnx or the path in main/app.py is updated. (Note: The provided code in main/app.py looks for export/model.onnx, so ensure this path is correct for your setup or that the file is present at build time.)
Build the Docker image:
```
docker build -t ocr-api .
```
Run the Docker container:
```
docker run -p 8000:8000 ocr-api
```
The application will be running and accessible at http://localhost:8000.