Medical-Validator

Sleeping

App Files Files Community

Medical-Validator / README.md

saifisvibinn

Fix invalid metadata color

95c340e 3 months ago

preview code

raw

history blame contribute delete

4.83 kB

metadata

title: Medical Document Validator
emoji: 🏥
colorFrom: blue
colorTo: blue
sdk: docker
app_port: 7860

Medical Document Validator

A robust backend service that validates medical documents (PDF, DOCX, PPTX) against predefined templates using Large Language Models (LLM).

Features

Multi-format Support: Validates PDF, DOCX, and PPTX documents
Template-based Validation: Uses structured JSON templates to define required elements
LLM-powered: Uses Anthropic's Claude API for context-aware document validation
RESTful API: FastAPI-based endpoints for easy integration

Project Structure

medical-validator/
├── app/
│   ├── __init__.py
│   ├── main.py          # FastAPI app with /templates and /validate endpoints
│   ├── validator.py     # Core validation logic with document extraction and LLM interaction
│   └── templates.json   # Template configuration (18 templates)
├── requirements.txt     # Python dependencies
├── .env                 # Environment variables (create from .env.example)
└── README.md           # This file

Setup Instructions

1. Install Dependencies

pip install -r requirements.txt

2. Configure API Key

Copy the example environment file:
```
cp .env.example .env
```
Get your Anthropic API key from https://console.anthropic.com/
Edit .env and replace your_anthropic_api_key_here with your actual API key:
```
LLM_API_KEY=sk-ant-api03-...
```

3. Run the Server

uvicorn app.main:app --reload

The API will be available at http://localhost:8000

API Endpoints

GET `/templates`

Returns a list of all available templates with their keys and friendly names.

Response:

[
  {
    "template_key": "certificate_appreciation_speaker",
    "friendly_name": "Certificate of Appreciation (Speaker/Chairperson)"
  },
  ...
]

POST `/validate`

Validates a document against a specified template.

Parameters:

file (form-data): The document file to validate (PDF, DOCX, or PPTX)
template_key (query): The template key to validate against

Example using curl:

curl -X POST "http://localhost:8000/validate?template_key=certificate_appreciation_speaker" \
  -F "file=@document.pdf"

Response:

{
  "template_key": "certificate_appreciation_speaker",
  "status": "PASS",
  "summary": "All required elements found",
  "elements_report": [
    {
      "id": "certificate_title",
      "label": "Certificate Title",
      "required": true,
      "is_present": true,
      "reason": "Found phrase 'Certificate of Appreciation' in document"
    },
    ...
  ]
}

GET `/health`

Health check endpoint to verify API key configuration.

Available Templates

The system includes 18 predefined templates:

Certificate of Appreciation (Speaker/Chairperson)
Certificate of Attendance
CPD Certificate of Accreditation (Generic)
HTML Email Reminder
HTML Invitation
PDF Invitation
PDF Save the Date
Printed Invitation
RCP Certificate of Attendance
Agenda Page
DHA Certificate of Accreditation (President + Chairs)
Certificate of Appreciation (Sponsor)
Evaluation Form (Post-Event)
Event Booklet
Landing Page & Registration
Slides Permission Form

Validation Logic

The validator:

Extracts text from the uploaded document based on file type
Loads the template configuration for the specified template key
Generates a detailed prompt for the LLM with all template requirements
Calls Claude API to analyze the document against the template
Returns a structured report with element-by-element validation results

Limitations

Visual Elements: Logos, signatures, and QR codes require image/OCR processing beyond basic text extraction. The validator will note these limitations in the report.
Table Structure: Complex table structures with specific column validation may need advanced parsing. Basic text extraction may not preserve table structure perfectly.
Image-based PDFs: PDFs that are image scans (not text-based) will require OCR preprocessing.

Development

Running in Development Mode

uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

API Documentation

Once the server is running, visit:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

Error Handling

The API returns appropriate HTTP status codes:

200: Success
400: Bad request (unsupported file format, empty file)
404: Template not found
422: Validation error (extraction failure, LLM parsing error)
500: Internal server error

License

This project is for internal use.