Spaces:
Sleeping
End-to-End OCR with CRNN, CTC, and FastAPI
This project provides a complete pipeline for Optical Character Recognition (OCR). It features a Convolutional Recurrent Neural Network (CRNN) trained with Connectionist Temporal Classification (CTC) loss for text recognition. The trained model is served via a FastAPI application, which uses the CRAFT model for initial text detection.
The entire workflow, from data preparation and model training to deployment as a web service, is documented and implemented in this repository.
Table of Contents
- Workflow Overview
- Features
- Project Structure
- Technical Details
- Getting Started
- Model Training and Export
- Inference
- API Server (FastAPI)
- Deployment with Docker
Workflow Overview
The project follows a clear, step-by-step process from model creation to deployment:
- Train the Model: The
notebook/ocr.ipynbnotebook is used to train the CRNN text recognition model on the MJSynth dataset. - Export the Model: The trained PyTorch model is exported to the ONNX format (
model.onnx) for efficient inference. - Serve the Model: The FastAPI application (
main/app.py) loads the ONNX model and the CRAFT text detection model to provide OCR capabilities through a web API. - Deploy: The entire application is containerized using Docker for easy and reproducible deployment.
Features
- Two-Stage OCR: Uses CRAFT for accurate text detection and a CRNN for robust text recognition.
- Deep Learning Model: An optimized CRNN architecture implemented in PyTorch.
- Efficient Inference: Model is exported to ONNX for fast performance.
- Web API: A FastAPI server with both REST and WebSocket endpoints.
- Reproducible Environment: Comes with a
Dockerfilefor easy setup and deployment. - Complete Workflow: Includes all steps from training to deployment.
Project Structure
.
βββ .gitignore # Files to be ignored by Git
βββ Dockerfile # Docker configuration for the API
βββ main
β βββ app.py # Main FastAPI application
β βββ core # Modularized application logic
β βββ __init__.py
β βββ config.py # Configuration variables
β βββ models.py # Model loading (CRAFT)
β βββ ocr.py # Core OCR pipeline
β βββ utils.py # Utility functions
βββ models
β βββ final.pth # Final trained PyTorch model weights
β βββ model.onnx # Trained model in ONNX format
βββ notebook
β βββ ocr.ipynb # Jupyter Notebook for training and export
βββ requirements.txt # Python dependencies
Technical Details
Text Detection: CRAFT
The FastAPI application first uses the CRAFT (Character-Region Awareness for Text) model to detect text regions in the input image. It identifies bounding boxes around words or lines of text. This project uses the hezarai/CRAFT implementation.
Text Recognition: CRNN
For each bounding box detected by CRAFT, a Convolutional Recurrent Neural Network (CRNN) is used to recognize the text within that region.
- Convolutional Layers (CNN): Serve as a powerful feature extractor, processing the image patch and outputting a sequence of feature vectors.
- Recurrent Layers (RNN): A bidirectional LSTM network processes the feature sequence, capturing contextual dependencies between characters.
- CTC Loss: The model is trained with Connectionist Temporal Classification (CTC) loss, which eliminates the need for character-level alignment between the input image and the output text, making it perfect for OCR.
Getting Started
Prerequisites
- Python 3.9 or higher
- An NVIDIA GPU with CUDA for training is highly recommended.
- Docker for containerized deployment.
Installation
Clone the repository:
git clone <repository-url> cd crnn-ctc-ocrSet up a virtual environment (recommended):
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`Install dependencies:
pip install -r requirements.txt
Model Training and Export
The notebook/ocr.ipynb notebook contains the complete code for training and exporting the model.
Dataset
The model is trained on the MJSynth (MJ Synth) dataset, a large-scale synthetic dataset for text recognition. The notebook automatically downloads it using the datasets library from Hugging Face (priyank-m/MJSynth_text_recognition).
Running the Training Notebook
- Launch Jupyter:
jupyter notebook - Open
notebook/ocr.ipynb. - You can run all cells to execute the full pipeline. The
DEMOflag is set toTrueby default to train on a smaller subset for a quick run. Set it toFalsefor full training. - The notebook will:
- Load and preprocess the dataset.
- Define the CRNN model, loss function, and optimizer.
- Run the training loop, showing progress and validation metrics (Character Error Rate, Word Error Rate).
- Save the best model to
checkpoints/best.pthand the final model tocheckpoints/final.pth.
Exporting to ONNX
After training, the last cells of the notebook handle the export to ONNX.
- It takes the trained CRNN model.
- It exports the model to
export/model.onnx. This file is crucial for the FastAPI application. The repository already includes a pre-trainedmodels/model.onnx.
Inference
Notebook Inference
The notebook includes helper functions to test the model directly.
1. Using the PyTorch model (.pth):
# predict with the best saved PyTorch model
predict('path/to/your/image.png', model_path='checkpoints/best.pth')
2. Using the ONNX model:
# predict with the exported ONNX model
predict_onnx('path/to/your/image.png', onnx_path='export/model.onnx')
API Server (FastAPI)
The application provides a web server to perform OCR on uploaded images.
Running the Server
To run the API server locally:
python main/app.py
The server will start on http://localhost:8000.
API Endpoints
HTTP POST Request
- Endpoint:
POST /predict/image - Description: Upload an image and receive the OCR results in JSON format.
- Example using
curl:curl -X POST -F "file=@/path/to/your/image.jpg" http://localhost:8000/predict/image - Response: A JSON object containing the detected paragraph, lines, words, and a base64-encoded image with bounding boxes drawn on it.
WebSocket Connection
- Endpoint:
ws://localhost:8000/ws/predict - Description: A WebSocket endpoint for real-time OCR. Send an image as bytes, and the server will return the OCR result as a JSON message. This is useful for streaming or interactive applications.
Deployment with Docker
The project includes a Dockerfile to easily containerize and deploy the application.
Prerequisite: Ensure the
models/model.onnxfile exists. If you've trained your own model, make sure your exportedexport/model.onnxis moved tomodels/model.onnxor the path inmain/app.pyis updated. (Note: The provided code inmain/app.pylooks forexport/model.onnx, so ensure this path is correct for your setup or that the file is present at build time.)Build the Docker image:
docker build -t ocr-api .Run the Docker container:
docker run -p 8000:8000 ocr-apiThe application will be running and accessible at
http://localhost:8000.