Spaces:

dev-models
/

ocr-build001

Sleeping

App Files Files Community

ocr-build001 / README copy.md

dev-models

initial commit

0d9f6c2 about 1 month ago

preview code

raw

history blame contribute delete

8.51 kB

	# End-to-End OCR with CRNN, CTC, and FastAPI

	This project provides a complete pipeline for Optical Character Recognition (OCR). It features a Convolutional Recurrent Neural Network (CRNN) trained with Connectionist Temporal Classification (CTC) loss for text recognition. The trained model is served via a FastAPI application, which uses the CRAFT model for initial text detection.

	The entire workflow, from data preparation and model training to deployment as a web service, is documented and implemented in this repository.

	## Table of Contents

	- [Workflow Overview](#workflow-overview)
	- [Features](#features)
	- [Project Structure](#project-structure)
	- [Technical Details](#technical-details)
	- [Text Detection: CRAFT](#text-detection-craft)
	- [Text Recognition: CRNN](#text-recognition-crnn)
	- [Getting Started](#getting-started)
	- [Prerequisites](#prerequisites)
	- [Installation](#installation)
	- [Model Training and Export](#model-training-and-export)
	- [Dataset](#dataset)
	- [Running the Training Notebook](#running-the-training-notebook)
	- [Exporting to ONNX](#exporting-to-onnx)
	- [Inference](#inference)
	- [Notebook Inference](#notebook-inference)
	- [API Server (FastAPI)](#api-server-fastapi)
	- [Running the Server](#running-the-server)
	- [API Endpoints](#api-endpoints)
	- [HTTP POST Request](#http-post-request)
	- [WebSocket Connection](#websocket-connection)
	- [Deployment with Docker](#deployment-with-docker)

	## Workflow Overview

	The project follows a clear, step-by-step process from model creation to deployment:

	1. Train the Model: The `notebook/ocr.ipynb` notebook is used to train the CRNN text recognition model on the MJSynth dataset.
	2. Export the Model: The trained PyTorch model is exported to the ONNX format (`model.onnx`) for efficient inference.
	3. Serve the Model: The FastAPI application (`main/app.py`) loads the ONNX model and the CRAFT text detection model to provide OCR capabilities through a web API.
	4. Deploy: The entire application is containerized using Docker for easy and reproducible deployment.

	## Features

	- Two-Stage OCR: Uses CRAFT for accurate text detection and a CRNN for robust text recognition.
	- Deep Learning Model: An optimized CRNN architecture implemented in PyTorch.
	- Efficient Inference: Model is exported to ONNX for fast performance.
	- Web API: A FastAPI server with both REST and WebSocket endpoints.
	- Reproducible Environment: Comes with a `Dockerfile` for easy setup and deployment.
	- Complete Workflow: Includes all steps from training to deployment.

	## Project Structure

	```
	.
	├── .gitignore # Files to be ignored by Git
	├── Dockerfile # Docker configuration for the API
	├── main
	│ ├── app.py # Main FastAPI application
	│ └── core # Modularized application logic
	│ ├── __init__.py
	│ ├── config.py # Configuration variables
	│ ├── models.py # Model loading (CRAFT)
	│ ├── ocr.py # Core OCR pipeline
	│ └── utils.py # Utility functions
	├── models
	│ ├── final.pth # Final trained PyTorch model weights
	│ └── model.onnx # Trained model in ONNX format
	├── notebook
	│ └── ocr.ipynb # Jupyter Notebook for training and export
	└── requirements.txt # Python dependencies
	```

	## Technical Details

	### Text Detection: CRAFT

	The FastAPI application first uses the CRAFT (Character-Region Awareness for Text) model to detect text regions in the input image. It identifies bounding boxes around words or lines of text. This project uses the `hezarai/CRAFT` implementation.

	### Text Recognition: CRNN

	For each bounding box detected by CRAFT, a Convolutional Recurrent Neural Network (CRNN) is used to recognize the text within that region.

	- Convolutional Layers (CNN): Serve as a powerful feature extractor, processing the image patch and outputting a sequence of feature vectors.
	- Recurrent Layers (RNN): A bidirectional LSTM network processes the feature sequence, capturing contextual dependencies between characters.
	- CTC Loss: The model is trained with Connectionist Temporal Classification (CTC) loss, which eliminates the need for character-level alignment between the input image and the output text, making it perfect for OCR.

	## Getting Started

	### Prerequisites

	- Python 3.9 or higher
	- An NVIDIA GPU with CUDA for training is highly recommended.
	- Docker for containerized deployment.

	### Installation

	1. Clone the repository:
	```bash
	git clone <repository-url>
	cd crnn-ctc-ocr
	```

	2. Set up a virtual environment (recommended):
	```bash
	python -m venv venv
	source venv/bin/activate # On Windows use `venv\Scripts\activate`
	```

	3. Install dependencies:
	```bash
	pip install -r requirements.txt
	```

	## Model Training and Export

	The `notebook/ocr.ipynb` notebook contains the complete code for training and exporting the model.

	### Dataset

	The model is trained on the MJSynth (MJ Synth) dataset, a large-scale synthetic dataset for text recognition. The notebook automatically downloads it using the `datasets` library from Hugging Face (`priyank-m/MJSynth_text_recognition`).

	### Running the Training Notebook

	1. Launch Jupyter:
	```bash
	jupyter notebook
	```
	2. Open `notebook/ocr.ipynb`.
	3. You can run all cells to execute the full pipeline. The `DEMO` flag is set to `True` by default to train on a smaller subset for a quick run. Set it to `False` for full training.
	4. The notebook will:
	- Load and preprocess the dataset.
	- Define the CRNN model, loss function, and optimizer.
	- Run the training loop, showing progress and validation metrics (Character Error Rate, Word Error Rate).
	- Save the best model to `checkpoints/best.pth` and the final model to `checkpoints/final.pth`.

	### Exporting to ONNX

	After training, the last cells of the notebook handle the export to ONNX.

	- It takes the trained CRNN model.
	- It exports the model to `export/model.onnx`. This file is crucial for the FastAPI application. The repository already includes a pre-trained `models/model.onnx`.

	## Inference

	### Notebook Inference

	The notebook includes helper functions to test the model directly.

	1. Using the PyTorch model (`.pth`):
	```python
	# predict with the best saved PyTorch model
	predict('path/to/your/image.png', model_path='checkpoints/best.pth')
	```

	2. Using the ONNX model:
	```python
	# predict with the exported ONNX model
	predict_onnx('path/to/your/image.png', onnx_path='export/model.onnx')
	```

	## API Server (FastAPI)

	The application provides a web server to perform OCR on uploaded images.

	### Running the Server

	To run the API server locally:
	```bash
	python main/app.py
	```
	The server will start on `http://localhost:8000`.

	### API Endpoints

	#### HTTP POST Request

	- Endpoint: `POST /predict/image`
	- Description: Upload an image and receive the OCR results in JSON format.
	- Example using `curl`:
	```bash
	curl -X POST -F "file=@/path/to/your/image.jpg" http://localhost:8000/predict/image
	```
	- Response: A JSON object containing the detected paragraph, lines, words, and a base64-encoded image with bounding boxes drawn on it.

	#### WebSocket Connection

	- Endpoint: `ws://localhost:8000/ws/predict`
	- Description: A WebSocket endpoint for real-time OCR. Send an image as bytes, and the server will return the OCR result as a JSON message. This is useful for streaming or interactive applications.

	## Deployment with Docker

	The project includes a `Dockerfile` to easily containerize and deploy the application.

	1. Prerequisite: Ensure the `models/model.onnx` file exists. If you've trained your own model, make sure your exported `export/model.onnx` is moved to `models/model.onnx` or the path in `main/app.py` is updated.
	(Note: The provided code in `main/app.py` looks for `export/model.onnx`, so ensure this path is correct for your setup or that the file is present at build time.)

	2. Build the Docker image:
	```bash
	docker build -t ocr-api .
	```

	3. Run the Docker container:
	```bash
	docker run -p 8000:8000 ocr-api
	```
	The application will be running and accessible at `http://localhost:8000`.