Spaces:
Sleeping
Sleeping
| # End-to-End OCR with CRNN, CTC, and FastAPI | |
| This project provides a complete pipeline for Optical Character Recognition (OCR). It features a **Convolutional Recurrent Neural Network (CRNN)** trained with **Connectionist Temporal Classification (CTC)** loss for text recognition. The trained model is served via a **FastAPI** application, which uses the **CRAFT** model for initial text detection. | |
| The entire workflow, from data preparation and model training to deployment as a web service, is documented and implemented in this repository. | |
| ## Table of Contents | |
| - [Workflow Overview](#workflow-overview) | |
| - [Features](#features) | |
| - [Project Structure](#project-structure) | |
| - [Technical Details](#technical-details) | |
| - [Text Detection: CRAFT](#text-detection-craft) | |
| - [Text Recognition: CRNN](#text-recognition-crnn) | |
| - [Getting Started](#getting-started) | |
| - [Prerequisites](#prerequisites) | |
| - [Installation](#installation) | |
| - [Model Training and Export](#model-training-and-export) | |
| - [Dataset](#dataset) | |
| - [Running the Training Notebook](#running-the-training-notebook) | |
| - [Exporting to ONNX](#exporting-to-onnx) | |
| - [Inference](#inference) | |
| - [Notebook Inference](#notebook-inference) | |
| - [API Server (FastAPI)](#api-server-fastapi) | |
| - [Running the Server](#running-the-server) | |
| - [API Endpoints](#api-endpoints) | |
| - [HTTP POST Request](#http-post-request) | |
| - [WebSocket Connection](#websocket-connection) | |
| - [Deployment with Docker](#deployment-with-docker) | |
| ## Workflow Overview | |
| The project follows a clear, step-by-step process from model creation to deployment: | |
| 1. **Train the Model**: The `notebook/ocr.ipynb` notebook is used to train the CRNN text recognition model on the MJSynth dataset. | |
| 2. **Export the Model**: The trained PyTorch model is exported to the ONNX format (`model.onnx`) for efficient inference. | |
| 3. **Serve the Model**: The FastAPI application (`main/app.py`) loads the ONNX model and the CRAFT text detection model to provide OCR capabilities through a web API. | |
| 4. **Deploy**: The entire application is containerized using Docker for easy and reproducible deployment. | |
| ## Features | |
| - **Two-Stage OCR**: Uses CRAFT for accurate text detection and a CRNN for robust text recognition. | |
| - **Deep Learning Model**: An optimized CRNN architecture implemented in PyTorch. | |
| - **Efficient Inference**: Model is exported to ONNX for fast performance. | |
| - **Web API**: A FastAPI server with both REST and WebSocket endpoints. | |
| - **Reproducible Environment**: Comes with a `Dockerfile` for easy setup and deployment. | |
| - **Complete Workflow**: Includes all steps from training to deployment. | |
| ## Project Structure | |
| ``` | |
| . | |
| βββ .gitignore # Files to be ignored by Git | |
| βββ Dockerfile # Docker configuration for the API | |
| βββ main | |
| β βββ app.py # Main FastAPI application | |
| β βββ core # Modularized application logic | |
| β βββ __init__.py | |
| β βββ config.py # Configuration variables | |
| β βββ models.py # Model loading (CRAFT) | |
| β βββ ocr.py # Core OCR pipeline | |
| β βββ utils.py # Utility functions | |
| βββ models | |
| β βββ final.pth # Final trained PyTorch model weights | |
| β βββ model.onnx # Trained model in ONNX format | |
| βββ notebook | |
| β βββ ocr.ipynb # Jupyter Notebook for training and export | |
| βββ requirements.txt # Python dependencies | |
| ``` | |
| ## Technical Details | |
| ### Text Detection: CRAFT | |
| The FastAPI application first uses the **CRAFT (Character-Region Awareness for Text)** model to detect text regions in the input image. It identifies bounding boxes around words or lines of text. This project uses the `hezarai/CRAFT` implementation. | |
| ### Text Recognition: CRNN | |
| For each bounding box detected by CRAFT, a **Convolutional Recurrent Neural Network (CRNN)** is used to recognize the text within that region. | |
| - **Convolutional Layers (CNN)**: Serve as a powerful feature extractor, processing the image patch and outputting a sequence of feature vectors. | |
| - **Recurrent Layers (RNN)**: A bidirectional LSTM network processes the feature sequence, capturing contextual dependencies between characters. | |
| - **CTC Loss**: The model is trained with Connectionist Temporal Classification (CTC) loss, which eliminates the need for character-level alignment between the input image and the output text, making it perfect for OCR. | |
| ## Getting Started | |
| ### Prerequisites | |
| - Python 3.9 or higher | |
| - An NVIDIA GPU with CUDA for training is highly recommended. | |
| - Docker for containerized deployment. | |
| ### Installation | |
| 1. **Clone the repository:** | |
| ```bash | |
| git clone <repository-url> | |
| cd crnn-ctc-ocr | |
| ``` | |
| 2. **Set up a virtual environment (recommended):** | |
| ```bash | |
| python -m venv venv | |
| source venv/bin/activate # On Windows use `venv\Scripts\activate` | |
| ``` | |
| 3. **Install dependencies:** | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| ## Model Training and Export | |
| The `notebook/ocr.ipynb` notebook contains the complete code for training and exporting the model. | |
| ### Dataset | |
| The model is trained on the **MJSynth (MJ Synth)** dataset, a large-scale synthetic dataset for text recognition. The notebook automatically downloads it using the `datasets` library from Hugging Face (`priyank-m/MJSynth_text_recognition`). | |
| ### Running the Training Notebook | |
| 1. **Launch Jupyter:** | |
| ```bash | |
| jupyter notebook | |
| ``` | |
| 2. Open `notebook/ocr.ipynb`. | |
| 3. You can run all cells to execute the full pipeline. The `DEMO` flag is set to `True` by default to train on a smaller subset for a quick run. Set it to `False` for full training. | |
| 4. The notebook will: | |
| - Load and preprocess the dataset. | |
| - Define the CRNN model, loss function, and optimizer. | |
| - Run the training loop, showing progress and validation metrics (Character Error Rate, Word Error Rate). | |
| - Save the best model to `checkpoints/best.pth` and the final model to `checkpoints/final.pth`. | |
| ### Exporting to ONNX | |
| After training, the last cells of the notebook handle the export to ONNX. | |
| - It takes the trained CRNN model. | |
| - It exports the model to `export/model.onnx`. **This file is crucial for the FastAPI application.** The repository already includes a pre-trained `models/model.onnx`. | |
| ## Inference | |
| ### Notebook Inference | |
| The notebook includes helper functions to test the model directly. | |
| **1. Using the PyTorch model (`.pth`):** | |
| ```python | |
| # predict with the best saved PyTorch model | |
| predict('path/to/your/image.png', model_path='checkpoints/best.pth') | |
| ``` | |
| **2. Using the ONNX model:** | |
| ```python | |
| # predict with the exported ONNX model | |
| predict_onnx('path/to/your/image.png', onnx_path='export/model.onnx') | |
| ``` | |
| ## API Server (FastAPI) | |
| The application provides a web server to perform OCR on uploaded images. | |
| ### Running the Server | |
| To run the API server locally: | |
| ```bash | |
| python main/app.py | |
| ``` | |
| The server will start on `http://localhost:8000`. | |
| ### API Endpoints | |
| #### HTTP POST Request | |
| - **Endpoint**: `POST /predict/image` | |
| - **Description**: Upload an image and receive the OCR results in JSON format. | |
| - **Example using `curl`**: | |
| ```bash | |
| curl -X POST -F "file=@/path/to/your/image.jpg" http://localhost:8000/predict/image | |
| ``` | |
| - **Response**: A JSON object containing the detected paragraph, lines, words, and a base64-encoded image with bounding boxes drawn on it. | |
| #### WebSocket Connection | |
| - **Endpoint**: `ws://localhost:8000/ws/predict` | |
| - **Description**: A WebSocket endpoint for real-time OCR. Send an image as bytes, and the server will return the OCR result as a JSON message. This is useful for streaming or interactive applications. | |
| ## Deployment with Docker | |
| The project includes a `Dockerfile` to easily containerize and deploy the application. | |
| 1. **Prerequisite**: Ensure the `models/model.onnx` file exists. If you've trained your own model, make sure your exported `export/model.onnx` is moved to `models/model.onnx` or the path in `main/app.py` is updated. | |
| *(Note: The provided code in `main/app.py` looks for `export/model.onnx`, so ensure this path is correct for your setup or that the file is present at build time.)* | |
| 2. **Build the Docker image:** | |
| ```bash | |
| docker build -t ocr-api . | |
| ``` | |
| 3. **Run the Docker container:** | |
| ```bash | |
| docker run -p 8000:8000 ocr-api | |
| ``` | |
| The application will be running and accessible at `http://localhost:8000`. |