Spaces:
Sleeping
title: Medical Document Validator
emoji: π₯
colorFrom: blue
colorTo: blue
sdk: docker
app_port: 7860
Medical Document Validator
A robust backend service that validates medical documents (PDF, DOCX, PPTX) against predefined templates using Large Language Models (LLM).
Features
- Multi-format Support: Validates PDF, DOCX, and PPTX documents
- Template-based Validation: Uses structured JSON templates to define required elements
- LLM-powered: Uses Anthropic's Claude API for context-aware document validation
- RESTful API: FastAPI-based endpoints for easy integration
Project Structure
medical-validator/
βββ app/
β βββ __init__.py
β βββ main.py # FastAPI app with /templates and /validate endpoints
β βββ validator.py # Core validation logic with document extraction and LLM interaction
β βββ templates.json # Template configuration (18 templates)
βββ requirements.txt # Python dependencies
βββ .env # Environment variables (create from .env.example)
βββ README.md # This file
Setup Instructions
1. Install Dependencies
pip install -r requirements.txt
2. Configure API Key
Copy the example environment file:
cp .env.example .envGet your Anthropic API key from https://console.anthropic.com/
Edit
.envand replaceyour_anthropic_api_key_herewith your actual API key:LLM_API_KEY=sk-ant-api03-...
3. Run the Server
uvicorn app.main:app --reload
The API will be available at http://localhost:8000
API Endpoints
GET /templates
Returns a list of all available templates with their keys and friendly names.
Response:
[
{
"template_key": "certificate_appreciation_speaker",
"friendly_name": "Certificate of Appreciation (Speaker/Chairperson)"
},
...
]
POST /validate
Validates a document against a specified template.
Parameters:
file(form-data): The document file to validate (PDF, DOCX, or PPTX)template_key(query): The template key to validate against
Example using curl:
curl -X POST "http://localhost:8000/validate?template_key=certificate_appreciation_speaker" \
-F "file=@document.pdf"
Response:
{
"template_key": "certificate_appreciation_speaker",
"status": "PASS",
"summary": "All required elements found",
"elements_report": [
{
"id": "certificate_title",
"label": "Certificate Title",
"required": true,
"is_present": true,
"reason": "Found phrase 'Certificate of Appreciation' in document"
},
...
]
}
GET /health
Health check endpoint to verify API key configuration.
Available Templates
The system includes 18 predefined templates:
- Certificate of Appreciation (Speaker/Chairperson)
- Certificate of Attendance
- CPD Certificate of Accreditation (Generic)
- HTML Email Reminder
- HTML Invitation
- PDF Invitation
- PDF Save the Date
- Printed Invitation
- RCP Certificate of Attendance
- Agenda Page
- DHA Certificate of Accreditation (President + Chairs)
- Certificate of Appreciation (Sponsor)
- Evaluation Form (Post-Event)
- Event Booklet
- Landing Page & Registration
- Slides Permission Form
Validation Logic
The validator:
- Extracts text from the uploaded document based on file type
- Loads the template configuration for the specified template key
- Generates a detailed prompt for the LLM with all template requirements
- Calls Claude API to analyze the document against the template
- Returns a structured report with element-by-element validation results
Limitations
- Visual Elements: Logos, signatures, and QR codes require image/OCR processing beyond basic text extraction. The validator will note these limitations in the report.
- Table Structure: Complex table structures with specific column validation may need advanced parsing. Basic text extraction may not preserve table structure perfectly.
- Image-based PDFs: PDFs that are image scans (not text-based) will require OCR preprocessing.
Development
Running in Development Mode
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
API Documentation
Once the server is running, visit:
- Swagger UI:
http://localhost:8000/docs - ReDoc:
http://localhost:8000/redoc
Error Handling
The API returns appropriate HTTP status codes:
200: Success400: Bad request (unsupported file format, empty file)404: Template not found422: Validation error (extraction failure, LLM parsing error)500: Internal server error
License
This project is for internal use.