ocr-build001 / README copy.md
dev-models's picture
initial commit
0d9f6c2
# End-to-End OCR with CRNN, CTC, and FastAPI
This project provides a complete pipeline for Optical Character Recognition (OCR). It features a **Convolutional Recurrent Neural Network (CRNN)** trained with **Connectionist Temporal Classification (CTC)** loss for text recognition. The trained model is served via a **FastAPI** application, which uses the **CRAFT** model for initial text detection.
The entire workflow, from data preparation and model training to deployment as a web service, is documented and implemented in this repository.
## Table of Contents
- [Workflow Overview](#workflow-overview)
- [Features](#features)
- [Project Structure](#project-structure)
- [Technical Details](#technical-details)
- [Text Detection: CRAFT](#text-detection-craft)
- [Text Recognition: CRNN](#text-recognition-crnn)
- [Getting Started](#getting-started)
- [Prerequisites](#prerequisites)
- [Installation](#installation)
- [Model Training and Export](#model-training-and-export)
- [Dataset](#dataset)
- [Running the Training Notebook](#running-the-training-notebook)
- [Exporting to ONNX](#exporting-to-onnx)
- [Inference](#inference)
- [Notebook Inference](#notebook-inference)
- [API Server (FastAPI)](#api-server-fastapi)
- [Running the Server](#running-the-server)
- [API Endpoints](#api-endpoints)
- [HTTP POST Request](#http-post-request)
- [WebSocket Connection](#websocket-connection)
- [Deployment with Docker](#deployment-with-docker)
## Workflow Overview
The project follows a clear, step-by-step process from model creation to deployment:
1. **Train the Model**: The `notebook/ocr.ipynb` notebook is used to train the CRNN text recognition model on the MJSynth dataset.
2. **Export the Model**: The trained PyTorch model is exported to the ONNX format (`model.onnx`) for efficient inference.
3. **Serve the Model**: The FastAPI application (`main/app.py`) loads the ONNX model and the CRAFT text detection model to provide OCR capabilities through a web API.
4. **Deploy**: The entire application is containerized using Docker for easy and reproducible deployment.
## Features
- **Two-Stage OCR**: Uses CRAFT for accurate text detection and a CRNN for robust text recognition.
- **Deep Learning Model**: An optimized CRNN architecture implemented in PyTorch.
- **Efficient Inference**: Model is exported to ONNX for fast performance.
- **Web API**: A FastAPI server with both REST and WebSocket endpoints.
- **Reproducible Environment**: Comes with a `Dockerfile` for easy setup and deployment.
- **Complete Workflow**: Includes all steps from training to deployment.
## Project Structure
```
.
β”œβ”€β”€ .gitignore # Files to be ignored by Git
β”œβ”€β”€ Dockerfile # Docker configuration for the API
β”œβ”€β”€ main
β”‚ β”œβ”€β”€ app.py # Main FastAPI application
β”‚ └── core # Modularized application logic
β”‚ β”œβ”€β”€ __init__.py
β”‚ β”œβ”€β”€ config.py # Configuration variables
β”‚ β”œβ”€β”€ models.py # Model loading (CRAFT)
β”‚ β”œβ”€β”€ ocr.py # Core OCR pipeline
β”‚ └── utils.py # Utility functions
β”œβ”€β”€ models
β”‚ β”œβ”€β”€ final.pth # Final trained PyTorch model weights
β”‚ └── model.onnx # Trained model in ONNX format
β”œβ”€β”€ notebook
β”‚ └── ocr.ipynb # Jupyter Notebook for training and export
└── requirements.txt # Python dependencies
```
## Technical Details
### Text Detection: CRAFT
The FastAPI application first uses the **CRAFT (Character-Region Awareness for Text)** model to detect text regions in the input image. It identifies bounding boxes around words or lines of text. This project uses the `hezarai/CRAFT` implementation.
### Text Recognition: CRNN
For each bounding box detected by CRAFT, a **Convolutional Recurrent Neural Network (CRNN)** is used to recognize the text within that region.
- **Convolutional Layers (CNN)**: Serve as a powerful feature extractor, processing the image patch and outputting a sequence of feature vectors.
- **Recurrent Layers (RNN)**: A bidirectional LSTM network processes the feature sequence, capturing contextual dependencies between characters.
- **CTC Loss**: The model is trained with Connectionist Temporal Classification (CTC) loss, which eliminates the need for character-level alignment between the input image and the output text, making it perfect for OCR.
## Getting Started
### Prerequisites
- Python 3.9 or higher
- An NVIDIA GPU with CUDA for training is highly recommended.
- Docker for containerized deployment.
### Installation
1. **Clone the repository:**
```bash
git clone <repository-url>
cd crnn-ctc-ocr
```
2. **Set up a virtual environment (recommended):**
```bash
python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
```
3. **Install dependencies:**
```bash
pip install -r requirements.txt
```
## Model Training and Export
The `notebook/ocr.ipynb` notebook contains the complete code for training and exporting the model.
### Dataset
The model is trained on the **MJSynth (MJ Synth)** dataset, a large-scale synthetic dataset for text recognition. The notebook automatically downloads it using the `datasets` library from Hugging Face (`priyank-m/MJSynth_text_recognition`).
### Running the Training Notebook
1. **Launch Jupyter:**
```bash
jupyter notebook
```
2. Open `notebook/ocr.ipynb`.
3. You can run all cells to execute the full pipeline. The `DEMO` flag is set to `True` by default to train on a smaller subset for a quick run. Set it to `False` for full training.
4. The notebook will:
- Load and preprocess the dataset.
- Define the CRNN model, loss function, and optimizer.
- Run the training loop, showing progress and validation metrics (Character Error Rate, Word Error Rate).
- Save the best model to `checkpoints/best.pth` and the final model to `checkpoints/final.pth`.
### Exporting to ONNX
After training, the last cells of the notebook handle the export to ONNX.
- It takes the trained CRNN model.
- It exports the model to `export/model.onnx`. **This file is crucial for the FastAPI application.** The repository already includes a pre-trained `models/model.onnx`.
## Inference
### Notebook Inference
The notebook includes helper functions to test the model directly.
**1. Using the PyTorch model (`.pth`):**
```python
# predict with the best saved PyTorch model
predict('path/to/your/image.png', model_path='checkpoints/best.pth')
```
**2. Using the ONNX model:**
```python
# predict with the exported ONNX model
predict_onnx('path/to/your/image.png', onnx_path='export/model.onnx')
```
## API Server (FastAPI)
The application provides a web server to perform OCR on uploaded images.
### Running the Server
To run the API server locally:
```bash
python main/app.py
```
The server will start on `http://localhost:8000`.
### API Endpoints
#### HTTP POST Request
- **Endpoint**: `POST /predict/image`
- **Description**: Upload an image and receive the OCR results in JSON format.
- **Example using `curl`**:
```bash
curl -X POST -F "file=@/path/to/your/image.jpg" http://localhost:8000/predict/image
```
- **Response**: A JSON object containing the detected paragraph, lines, words, and a base64-encoded image with bounding boxes drawn on it.
#### WebSocket Connection
- **Endpoint**: `ws://localhost:8000/ws/predict`
- **Description**: A WebSocket endpoint for real-time OCR. Send an image as bytes, and the server will return the OCR result as a JSON message. This is useful for streaming or interactive applications.
## Deployment with Docker
The project includes a `Dockerfile` to easily containerize and deploy the application.
1. **Prerequisite**: Ensure the `models/model.onnx` file exists. If you've trained your own model, make sure your exported `export/model.onnx` is moved to `models/model.onnx` or the path in `main/app.py` is updated.
*(Note: The provided code in `main/app.py` looks for `export/model.onnx`, so ensure this path is correct for your setup or that the file is present at build time.)*
2. **Build the Docker image:**
```bash
docker build -t ocr-api .
```
3. **Run the Docker container:**
```bash
docker run -p 8000:8000 ocr-api
```
The application will be running and accessible at `http://localhost:8000`.