Spaces:
Sleeping
Sleeping
| title: Medical Document Validator | |
| emoji: π₯ | |
| colorFrom: blue | |
| colorTo: blue | |
| sdk: docker | |
| app_port: 7860 | |
| # Medical Document Validator | |
| A robust backend service that validates medical documents (PDF, DOCX, PPTX) against predefined templates using Large Language Models (LLM). | |
| ## Features | |
| - **Multi-format Support**: Validates PDF, DOCX, and PPTX documents | |
| - **Template-based Validation**: Uses structured JSON templates to define required elements | |
| - **LLM-powered**: Uses Anthropic's Claude API for context-aware document validation | |
| - **RESTful API**: FastAPI-based endpoints for easy integration | |
| ## Project Structure | |
| ``` | |
| medical-validator/ | |
| βββ app/ | |
| β βββ __init__.py | |
| β βββ main.py # FastAPI app with /templates and /validate endpoints | |
| β βββ validator.py # Core validation logic with document extraction and LLM interaction | |
| β βββ templates.json # Template configuration (18 templates) | |
| βββ requirements.txt # Python dependencies | |
| βββ .env # Environment variables (create from .env.example) | |
| βββ README.md # This file | |
| ``` | |
| ## Setup Instructions | |
| ### 1. Install Dependencies | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| ### 2. Configure API Key | |
| 1. Copy the example environment file: | |
| ```bash | |
| cp .env.example .env | |
| ``` | |
| 2. Get your Anthropic API key from [https://console.anthropic.com/](https://console.anthropic.com/) | |
| 3. Edit `.env` and replace `your_anthropic_api_key_here` with your actual API key: | |
| ``` | |
| LLM_API_KEY=sk-ant-api03-... | |
| ``` | |
| ### 3. Run the Server | |
| ```bash | |
| uvicorn app.main:app --reload | |
| ``` | |
| The API will be available at `http://localhost:8000` | |
| ## API Endpoints | |
| ### GET `/templates` | |
| Returns a list of all available templates with their keys and friendly names. | |
| **Response:** | |
| ```json | |
| [ | |
| { | |
| "template_key": "certificate_appreciation_speaker", | |
| "friendly_name": "Certificate of Appreciation (Speaker/Chairperson)" | |
| }, | |
| ... | |
| ] | |
| ``` | |
| ### POST `/validate` | |
| Validates a document against a specified template. | |
| **Parameters:** | |
| - `file` (form-data): The document file to validate (PDF, DOCX, or PPTX) | |
| - `template_key` (query): The template key to validate against | |
| **Example using curl:** | |
| ```bash | |
| curl -X POST "http://localhost:8000/validate?template_key=certificate_appreciation_speaker" \ | |
| -F "file=@document.pdf" | |
| ``` | |
| **Response:** | |
| ```json | |
| { | |
| "template_key": "certificate_appreciation_speaker", | |
| "status": "PASS", | |
| "summary": "All required elements found", | |
| "elements_report": [ | |
| { | |
| "id": "certificate_title", | |
| "label": "Certificate Title", | |
| "required": true, | |
| "is_present": true, | |
| "reason": "Found phrase 'Certificate of Appreciation' in document" | |
| }, | |
| ... | |
| ] | |
| } | |
| ``` | |
| ### GET `/health` | |
| Health check endpoint to verify API key configuration. | |
| ## Available Templates | |
| The system includes 18 predefined templates: | |
| 1. Certificate of Appreciation (Speaker/Chairperson) | |
| 2. Certificate of Attendance | |
| 3. CPD Certificate of Accreditation (Generic) | |
| 4. HTML Email Reminder | |
| 5. HTML Invitation | |
| 6. PDF Invitation | |
| 7. PDF Save the Date | |
| 8. Printed Invitation | |
| 9. RCP Certificate of Attendance | |
| 10. Agenda Page | |
| 11. DHA Certificate of Accreditation (President + Chairs) | |
| 12. Certificate of Appreciation (Sponsor) | |
| 13. Evaluation Form (Post-Event) | |
| 14. Event Booklet | |
| 15. Landing Page & Registration | |
| 16. Slides Permission Form | |
| ## Validation Logic | |
| The validator: | |
| 1. **Extracts text** from the uploaded document based on file type | |
| 2. **Loads the template** configuration for the specified template key | |
| 3. **Generates a detailed prompt** for the LLM with all template requirements | |
| 4. **Calls Claude API** to analyze the document against the template | |
| 5. **Returns a structured report** with element-by-element validation results | |
| ### Limitations | |
| - **Visual Elements**: Logos, signatures, and QR codes require image/OCR processing beyond basic text extraction. The validator will note these limitations in the report. | |
| - **Table Structure**: Complex table structures with specific column validation may need advanced parsing. Basic text extraction may not preserve table structure perfectly. | |
| - **Image-based PDFs**: PDFs that are image scans (not text-based) will require OCR preprocessing. | |
| ## Development | |
| ### Running in Development Mode | |
| ```bash | |
| uvicorn app.main:app --reload --host 0.0.0.0 --port 8000 | |
| ``` | |
| ### API Documentation | |
| Once the server is running, visit: | |
| - Swagger UI: `http://localhost:8000/docs` | |
| - ReDoc: `http://localhost:8000/redoc` | |
| ## Error Handling | |
| The API returns appropriate HTTP status codes: | |
| - `200`: Success | |
| - `400`: Bad request (unsupported file format, empty file) | |
| - `404`: Template not found | |
| - `422`: Validation error (extraction failure, LLM parsing error) | |
| - `500`: Internal server error | |
| ## License | |
| This project is for internal use. | |