# End-to-End OCR with CRNN, CTC, and FastAPI This project provides a complete pipeline for Optical Character Recognition (OCR). It features a **Convolutional Recurrent Neural Network (CRNN)** trained with **Connectionist Temporal Classification (CTC)** loss for text recognition. The trained model is served via a **FastAPI** application, which uses the **CRAFT** model for initial text detection. The entire workflow, from data preparation and model training to deployment as a web service, is documented and implemented in this repository. ## Table of Contents - [Workflow Overview](#workflow-overview) - [Features](#features) - [Project Structure](#project-structure) - [Technical Details](#technical-details) - [Text Detection: CRAFT](#text-detection-craft) - [Text Recognition: CRNN](#text-recognition-crnn) - [Getting Started](#getting-started) - [Prerequisites](#prerequisites) - [Installation](#installation) - [Model Training and Export](#model-training-and-export) - [Dataset](#dataset) - [Running the Training Notebook](#running-the-training-notebook) - [Exporting to ONNX](#exporting-to-onnx) - [Inference](#inference) - [Notebook Inference](#notebook-inference) - [API Server (FastAPI)](#api-server-fastapi) - [Running the Server](#running-the-server) - [API Endpoints](#api-endpoints) - [HTTP POST Request](#http-post-request) - [WebSocket Connection](#websocket-connection) - [Deployment with Docker](#deployment-with-docker) ## Workflow Overview The project follows a clear, step-by-step process from model creation to deployment: 1. **Train the Model**: The `notebook/ocr.ipynb` notebook is used to train the CRNN text recognition model on the MJSynth dataset. 2. **Export the Model**: The trained PyTorch model is exported to the ONNX format (`model.onnx`) for efficient inference. 3. **Serve the Model**: The FastAPI application (`main/app.py`) loads the ONNX model and the CRAFT text detection model to provide OCR capabilities through a web API. 4. **Deploy**: The entire application is containerized using Docker for easy and reproducible deployment. ## Features - **Two-Stage OCR**: Uses CRAFT for accurate text detection and a CRNN for robust text recognition. - **Deep Learning Model**: An optimized CRNN architecture implemented in PyTorch. - **Efficient Inference**: Model is exported to ONNX for fast performance. - **Web API**: A FastAPI server with both REST and WebSocket endpoints. - **Reproducible Environment**: Comes with a `Dockerfile` for easy setup and deployment. - **Complete Workflow**: Includes all steps from training to deployment. ## Project Structure ``` . ├── .gitignore # Files to be ignored by Git ├── Dockerfile # Docker configuration for the API ├── main │ ├── app.py # Main FastAPI application │ └── core # Modularized application logic │ ├── __init__.py │ ├── config.py # Configuration variables │ ├── models.py # Model loading (CRAFT) │ ├── ocr.py # Core OCR pipeline │ └── utils.py # Utility functions ├── models │ ├── final.pth # Final trained PyTorch model weights │ └── model.onnx # Trained model in ONNX format ├── notebook │ └── ocr.ipynb # Jupyter Notebook for training and export └── requirements.txt # Python dependencies ``` ## Technical Details ### Text Detection: CRAFT The FastAPI application first uses the **CRAFT (Character-Region Awareness for Text)** model to detect text regions in the input image. It identifies bounding boxes around words or lines of text. This project uses the `hezarai/CRAFT` implementation. ### Text Recognition: CRNN For each bounding box detected by CRAFT, a **Convolutional Recurrent Neural Network (CRNN)** is used to recognize the text within that region. - **Convolutional Layers (CNN)**: Serve as a powerful feature extractor, processing the image patch and outputting a sequence of feature vectors. - **Recurrent Layers (RNN)**: A bidirectional LSTM network processes the feature sequence, capturing contextual dependencies between characters. - **CTC Loss**: The model is trained with Connectionist Temporal Classification (CTC) loss, which eliminates the need for character-level alignment between the input image and the output text, making it perfect for OCR. ## Getting Started ### Prerequisites - Python 3.9 or higher - An NVIDIA GPU with CUDA for training is highly recommended. - Docker for containerized deployment. ### Installation 1. **Clone the repository:** ```bash git clone cd crnn-ctc-ocr ``` 2. **Set up a virtual environment (recommended):** ```bash python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate` ``` 3. **Install dependencies:** ```bash pip install -r requirements.txt ``` ## Model Training and Export The `notebook/ocr.ipynb` notebook contains the complete code for training and exporting the model. ### Dataset The model is trained on the **MJSynth (MJ Synth)** dataset, a large-scale synthetic dataset for text recognition. The notebook automatically downloads it using the `datasets` library from Hugging Face (`priyank-m/MJSynth_text_recognition`). ### Running the Training Notebook 1. **Launch Jupyter:** ```bash jupyter notebook ``` 2. Open `notebook/ocr.ipynb`. 3. You can run all cells to execute the full pipeline. The `DEMO` flag is set to `True` by default to train on a smaller subset for a quick run. Set it to `False` for full training. 4. The notebook will: - Load and preprocess the dataset. - Define the CRNN model, loss function, and optimizer. - Run the training loop, showing progress and validation metrics (Character Error Rate, Word Error Rate). - Save the best model to `checkpoints/best.pth` and the final model to `checkpoints/final.pth`. ### Exporting to ONNX After training, the last cells of the notebook handle the export to ONNX. - It takes the trained CRNN model. - It exports the model to `export/model.onnx`. **This file is crucial for the FastAPI application.** The repository already includes a pre-trained `models/model.onnx`. ## Inference ### Notebook Inference The notebook includes helper functions to test the model directly. **1. Using the PyTorch model (`.pth`):** ```python # predict with the best saved PyTorch model predict('path/to/your/image.png', model_path='checkpoints/best.pth') ``` **2. Using the ONNX model:** ```python # predict with the exported ONNX model predict_onnx('path/to/your/image.png', onnx_path='export/model.onnx') ``` ## API Server (FastAPI) The application provides a web server to perform OCR on uploaded images. ### Running the Server To run the API server locally: ```bash python main/app.py ``` The server will start on `http://localhost:8000`. ### API Endpoints #### HTTP POST Request - **Endpoint**: `POST /predict/image` - **Description**: Upload an image and receive the OCR results in JSON format. - **Example using `curl`**: ```bash curl -X POST -F "file=@/path/to/your/image.jpg" http://localhost:8000/predict/image ``` - **Response**: A JSON object containing the detected paragraph, lines, words, and a base64-encoded image with bounding boxes drawn on it. #### WebSocket Connection - **Endpoint**: `ws://localhost:8000/ws/predict` - **Description**: A WebSocket endpoint for real-time OCR. Send an image as bytes, and the server will return the OCR result as a JSON message. This is useful for streaming or interactive applications. ## Deployment with Docker The project includes a `Dockerfile` to easily containerize and deploy the application. 1. **Prerequisite**: Ensure the `models/model.onnx` file exists. If you've trained your own model, make sure your exported `export/model.onnx` is moved to `models/model.onnx` or the path in `main/app.py` is updated. *(Note: The provided code in `main/app.py` looks for `export/model.onnx`, so ensure this path is correct for your setup or that the file is present at build time.)* 2. **Build the Docker image:** ```bash docker build -t ocr-api . ``` 3. **Run the Docker container:** ```bash docker run -p 8000:8000 ocr-api ``` The application will be running and accessible at `http://localhost:8000`.