Spaces:

dev-models
/

ocr-build001

Sleeping

File size: 8,507 Bytes

0d9f6c2

# End-to-End OCR with CRNN, CTC, and FastAPI

This project provides a complete pipeline for Optical Character Recognition (OCR). It features a **Convolutional Recurrent Neural Network (CRNN)** trained with **Connectionist Temporal Classification (CTC)** loss for text recognition. The trained model is served via a **FastAPI** application, which uses the **CRAFT** model for initial text detection.

The entire workflow, from data preparation and model training to deployment as a web service, is documented and implemented in this repository.

## Table of Contents

- [Workflow Overview](#workflow-overview)
- [Features](#features)
- [Project Structure](#project-structure)
- [Technical Details](#technical-details)
  - [Text Detection: CRAFT](#text-detection-craft)
  - [Text Recognition: CRNN](#text-recognition-crnn)
- [Getting Started](#getting-started)
  - [Prerequisites](#prerequisites)
  - [Installation](#installation)
- [Model Training and Export](#model-training-and-export)
  - [Dataset](#dataset)
  - [Running the Training Notebook](#running-the-training-notebook)
  - [Exporting to ONNX](#exporting-to-onnx)
- [Inference](#inference)
  - [Notebook Inference](#notebook-inference)
- [API Server (FastAPI)](#api-server-fastapi)
  - [Running the Server](#running-the-server)
  - [API Endpoints](#api-endpoints)
    - [HTTP POST Request](#http-post-request)
    - [WebSocket Connection](#websocket-connection)
- [Deployment with Docker](#deployment-with-docker)

## Workflow Overview

The project follows a clear, step-by-step process from model creation to deployment:

1.  **Train the Model**: The `notebook/ocr.ipynb` notebook is used to train the CRNN text recognition model on the MJSynth dataset.
2.  **Export the Model**: The trained PyTorch model is exported to the ONNX format (`model.onnx`) for efficient inference.
3.  **Serve the Model**: The FastAPI application (`main/app.py`) loads the ONNX model and the CRAFT text detection model to provide OCR capabilities through a web API.
4.  **Deploy**: The entire application is containerized using Docker for easy and reproducible deployment.

## Features

-   **Two-Stage OCR**: Uses CRAFT for accurate text detection and a CRNN for robust text recognition.
-   **Deep Learning Model**: An optimized CRNN architecture implemented in PyTorch.
-   **Efficient Inference**: Model is exported to ONNX for fast performance.
-   **Web API**: A FastAPI server with both REST and WebSocket endpoints.
-   **Reproducible Environment**: Comes with a `Dockerfile` for easy setup and deployment.
-   **Complete Workflow**: Includes all steps from training to deployment.

## Project Structure

```
.
├── .gitignore           # Files to be ignored by Git
├── Dockerfile           # Docker configuration for the API
├── main
│   ├── app.py           # Main FastAPI application
│   └── core             # Modularized application logic
│       ├── __init__.py
│       ├── config.py    # Configuration variables
│       ├── models.py    # Model loading (CRAFT)
│       ├── ocr.py       # Core OCR pipeline
│       └── utils.py     # Utility functions
├── models
│   ├── final.pth        # Final trained PyTorch model weights
│   └── model.onnx       # Trained model in ONNX format
├── notebook
│   └── ocr.ipynb        # Jupyter Notebook for training and export
└── requirements.txt     # Python dependencies
```

## Technical Details

### Text Detection: CRAFT

The FastAPI application first uses the **CRAFT (Character-Region Awareness for Text)** model to detect text regions in the input image. It identifies bounding boxes around words or lines of text. This project uses the `hezarai/CRAFT` implementation.

### Text Recognition: CRNN

For each bounding box detected by CRAFT, a **Convolutional Recurrent Neural Network (CRNN)** is used to recognize the text within that region.

-   **Convolutional Layers (CNN)**: Serve as a powerful feature extractor, processing the image patch and outputting a sequence of feature vectors.
-   **Recurrent Layers (RNN)**: A bidirectional LSTM network processes the feature sequence, capturing contextual dependencies between characters.
-   **CTC Loss**: The model is trained with Connectionist Temporal Classification (CTC) loss, which eliminates the need for character-level alignment between the input image and the output text, making it perfect for OCR.

## Getting Started

### Prerequisites

-   Python 3.9 or higher
-   An NVIDIA GPU with CUDA for training is highly recommended.
-   Docker for containerized deployment.

### Installation

1.  **Clone the repository:**
    ```bash
    git clone <repository-url>
    cd crnn-ctc-ocr
    ```

2.  **Set up a virtual environment (recommended):**
    ```bash
    python -m venv venv
    source venv/bin/activate  # On Windows use `venv\Scripts\activate`
    ```

3.  **Install dependencies:**
    ```bash
    pip install -r requirements.txt
    ```

## Model Training and Export

The `notebook/ocr.ipynb` notebook contains the complete code for training and exporting the model.

### Dataset

The model is trained on the **MJSynth (MJ Synth)** dataset, a large-scale synthetic dataset for text recognition. The notebook automatically downloads it using the `datasets` library from Hugging Face (`priyank-m/MJSynth_text_recognition`).

### Running the Training Notebook

1.  **Launch Jupyter:**
    ```bash
    jupyter notebook
    ```
2.  Open `notebook/ocr.ipynb`.
3.  You can run all cells to execute the full pipeline. The `DEMO` flag is set to `True` by default to train on a smaller subset for a quick run. Set it to `False` for full training.
4.  The notebook will:
    -   Load and preprocess the dataset.
    -   Define the CRNN model, loss function, and optimizer.
    -   Run the training loop, showing progress and validation metrics (Character Error Rate, Word Error Rate).
    -   Save the best model to `checkpoints/best.pth` and the final model to `checkpoints/final.pth`.

### Exporting to ONNX

After training, the last cells of the notebook handle the export to ONNX.

-   It takes the trained CRNN model.
-   It exports the model to `export/model.onnx`. **This file is crucial for the FastAPI application.** The repository already includes a pre-trained `models/model.onnx`.

## Inference

### Notebook Inference

The notebook includes helper functions to test the model directly.

**1. Using the PyTorch model (`.pth`):**
```python
# predict with the best saved PyTorch model
predict('path/to/your/image.png', model_path='checkpoints/best.pth')
```

**2. Using the ONNX model:**
```python
# predict with the exported ONNX model
predict_onnx('path/to/your/image.png', onnx_path='export/model.onnx')
```

## API Server (FastAPI)

The application provides a web server to perform OCR on uploaded images.

### Running the Server

To run the API server locally:
```bash
python main/app.py
```
The server will start on `http://localhost:8000`.

### API Endpoints

#### HTTP POST Request

-   **Endpoint**: `POST /predict/image`
-   **Description**: Upload an image and receive the OCR results in JSON format.
-   **Example using `curl`**:
    ```bash
    curl -X POST -F "file=@/path/to/your/image.jpg" http://localhost:8000/predict/image
    ```
-   **Response**: A JSON object containing the detected paragraph, lines, words, and a base64-encoded image with bounding boxes drawn on it.

#### WebSocket Connection

-   **Endpoint**: `ws://localhost:8000/ws/predict`
-   **Description**: A WebSocket endpoint for real-time OCR. Send an image as bytes, and the server will return the OCR result as a JSON message. This is useful for streaming or interactive applications.

## Deployment with Docker

The project includes a `Dockerfile` to easily containerize and deploy the application.

1.  **Prerequisite**: Ensure the `models/model.onnx` file exists. If you've trained your own model, make sure your exported `export/model.onnx` is moved to `models/model.onnx` or the path in `main/app.py` is updated.
    *(Note: The provided code in `main/app.py` looks for `export/model.onnx`, so ensure this path is correct for your setup or that the file is present at build time.)*

2.  **Build the Docker image:**
    ```bash
    docker build -t ocr-api .
    ```

3.  **Run the Docker container:**
    ```bash
    docker run -p 8000:8000 ocr-api
    ```
    The application will be running and accessible at `http://localhost:8000`.