Spaces:
Sleeping
Sleeping
File size: 8,507 Bytes
0d9f6c2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 |
# End-to-End OCR with CRNN, CTC, and FastAPI
This project provides a complete pipeline for Optical Character Recognition (OCR). It features a **Convolutional Recurrent Neural Network (CRNN)** trained with **Connectionist Temporal Classification (CTC)** loss for text recognition. The trained model is served via a **FastAPI** application, which uses the **CRAFT** model for initial text detection.
The entire workflow, from data preparation and model training to deployment as a web service, is documented and implemented in this repository.
## Table of Contents
- [Workflow Overview](#workflow-overview)
- [Features](#features)
- [Project Structure](#project-structure)
- [Technical Details](#technical-details)
- [Text Detection: CRAFT](#text-detection-craft)
- [Text Recognition: CRNN](#text-recognition-crnn)
- [Getting Started](#getting-started)
- [Prerequisites](#prerequisites)
- [Installation](#installation)
- [Model Training and Export](#model-training-and-export)
- [Dataset](#dataset)
- [Running the Training Notebook](#running-the-training-notebook)
- [Exporting to ONNX](#exporting-to-onnx)
- [Inference](#inference)
- [Notebook Inference](#notebook-inference)
- [API Server (FastAPI)](#api-server-fastapi)
- [Running the Server](#running-the-server)
- [API Endpoints](#api-endpoints)
- [HTTP POST Request](#http-post-request)
- [WebSocket Connection](#websocket-connection)
- [Deployment with Docker](#deployment-with-docker)
## Workflow Overview
The project follows a clear, step-by-step process from model creation to deployment:
1. **Train the Model**: The `notebook/ocr.ipynb` notebook is used to train the CRNN text recognition model on the MJSynth dataset.
2. **Export the Model**: The trained PyTorch model is exported to the ONNX format (`model.onnx`) for efficient inference.
3. **Serve the Model**: The FastAPI application (`main/app.py`) loads the ONNX model and the CRAFT text detection model to provide OCR capabilities through a web API.
4. **Deploy**: The entire application is containerized using Docker for easy and reproducible deployment.
## Features
- **Two-Stage OCR**: Uses CRAFT for accurate text detection and a CRNN for robust text recognition.
- **Deep Learning Model**: An optimized CRNN architecture implemented in PyTorch.
- **Efficient Inference**: Model is exported to ONNX for fast performance.
- **Web API**: A FastAPI server with both REST and WebSocket endpoints.
- **Reproducible Environment**: Comes with a `Dockerfile` for easy setup and deployment.
- **Complete Workflow**: Includes all steps from training to deployment.
## Project Structure
```
.
βββ .gitignore # Files to be ignored by Git
βββ Dockerfile # Docker configuration for the API
βββ main
β βββ app.py # Main FastAPI application
β βββ core # Modularized application logic
β βββ __init__.py
β βββ config.py # Configuration variables
β βββ models.py # Model loading (CRAFT)
β βββ ocr.py # Core OCR pipeline
β βββ utils.py # Utility functions
βββ models
β βββ final.pth # Final trained PyTorch model weights
β βββ model.onnx # Trained model in ONNX format
βββ notebook
β βββ ocr.ipynb # Jupyter Notebook for training and export
βββ requirements.txt # Python dependencies
```
## Technical Details
### Text Detection: CRAFT
The FastAPI application first uses the **CRAFT (Character-Region Awareness for Text)** model to detect text regions in the input image. It identifies bounding boxes around words or lines of text. This project uses the `hezarai/CRAFT` implementation.
### Text Recognition: CRNN
For each bounding box detected by CRAFT, a **Convolutional Recurrent Neural Network (CRNN)** is used to recognize the text within that region.
- **Convolutional Layers (CNN)**: Serve as a powerful feature extractor, processing the image patch and outputting a sequence of feature vectors.
- **Recurrent Layers (RNN)**: A bidirectional LSTM network processes the feature sequence, capturing contextual dependencies between characters.
- **CTC Loss**: The model is trained with Connectionist Temporal Classification (CTC) loss, which eliminates the need for character-level alignment between the input image and the output text, making it perfect for OCR.
## Getting Started
### Prerequisites
- Python 3.9 or higher
- An NVIDIA GPU with CUDA for training is highly recommended.
- Docker for containerized deployment.
### Installation
1. **Clone the repository:**
```bash
git clone <repository-url>
cd crnn-ctc-ocr
```
2. **Set up a virtual environment (recommended):**
```bash
python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
```
3. **Install dependencies:**
```bash
pip install -r requirements.txt
```
## Model Training and Export
The `notebook/ocr.ipynb` notebook contains the complete code for training and exporting the model.
### Dataset
The model is trained on the **MJSynth (MJ Synth)** dataset, a large-scale synthetic dataset for text recognition. The notebook automatically downloads it using the `datasets` library from Hugging Face (`priyank-m/MJSynth_text_recognition`).
### Running the Training Notebook
1. **Launch Jupyter:**
```bash
jupyter notebook
```
2. Open `notebook/ocr.ipynb`.
3. You can run all cells to execute the full pipeline. The `DEMO` flag is set to `True` by default to train on a smaller subset for a quick run. Set it to `False` for full training.
4. The notebook will:
- Load and preprocess the dataset.
- Define the CRNN model, loss function, and optimizer.
- Run the training loop, showing progress and validation metrics (Character Error Rate, Word Error Rate).
- Save the best model to `checkpoints/best.pth` and the final model to `checkpoints/final.pth`.
### Exporting to ONNX
After training, the last cells of the notebook handle the export to ONNX.
- It takes the trained CRNN model.
- It exports the model to `export/model.onnx`. **This file is crucial for the FastAPI application.** The repository already includes a pre-trained `models/model.onnx`.
## Inference
### Notebook Inference
The notebook includes helper functions to test the model directly.
**1. Using the PyTorch model (`.pth`):**
```python
# predict with the best saved PyTorch model
predict('path/to/your/image.png', model_path='checkpoints/best.pth')
```
**2. Using the ONNX model:**
```python
# predict with the exported ONNX model
predict_onnx('path/to/your/image.png', onnx_path='export/model.onnx')
```
## API Server (FastAPI)
The application provides a web server to perform OCR on uploaded images.
### Running the Server
To run the API server locally:
```bash
python main/app.py
```
The server will start on `http://localhost:8000`.
### API Endpoints
#### HTTP POST Request
- **Endpoint**: `POST /predict/image`
- **Description**: Upload an image and receive the OCR results in JSON format.
- **Example using `curl`**:
```bash
curl -X POST -F "file=@/path/to/your/image.jpg" http://localhost:8000/predict/image
```
- **Response**: A JSON object containing the detected paragraph, lines, words, and a base64-encoded image with bounding boxes drawn on it.
#### WebSocket Connection
- **Endpoint**: `ws://localhost:8000/ws/predict`
- **Description**: A WebSocket endpoint for real-time OCR. Send an image as bytes, and the server will return the OCR result as a JSON message. This is useful for streaming or interactive applications.
## Deployment with Docker
The project includes a `Dockerfile` to easily containerize and deploy the application.
1. **Prerequisite**: Ensure the `models/model.onnx` file exists. If you've trained your own model, make sure your exported `export/model.onnx` is moved to `models/model.onnx` or the path in `main/app.py` is updated.
*(Note: The provided code in `main/app.py` looks for `export/model.onnx`, so ensure this path is correct for your setup or that the file is present at build time.)*
2. **Build the Docker image:**
```bash
docker build -t ocr-api .
```
3. **Run the Docker container:**
```bash
docker run -p 8000:8000 ocr-api
```
The application will be running and accessible at `http://localhost:8000`. |