handbook-ocr-engine / README.md
internationalscholarsprogram's picture
Initial deploy: ISP Handbook OCR Engine
b12284c verified
metadata
title: ISP Handbook OCR Engine
emoji: 📄
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
license: mit
app_port: 7860

ISP Handbook OCR Engine

Extracts structured content from uploaded handbook PDFs using a hybrid text + OCR pipeline. Supports table detection, real-time editing, and multi-format export (PDF, DOCX, HTML, JSON).

Endpoints

Method Path Description
GET / Health probe
GET /health Detailed health check
GET /docs Swagger UI
POST /extract Plain text extraction
POST /extract-structured Structured extraction with tables
POST /export Export edited content to file
POST /save Persist to platform database