--- title: Radiology Report NER API emoji: 🩺 colorFrom: blue colorTo: purple sdk: docker pinned: false license: apache-2.0 --- # 🩺 Radiology Report NER API **Secure, encrypted medical document analysis with Named Entity Recognition and OCR** [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) [![Python 3.11](https://img.shields.io/badge/python-3.11-blue.svg)](https://www.python.org/downloads/release/python-3110/) [![FastAPI](https://img.shields.io/badge/FastAPI-0.104.1-009688.svg?logo=fastapi)](https://fastapi.tiangolo.com) [![spaCy](https://img.shields.io/badge/spaCy-3.7-09a3d5.svg?logo=spacy)](https://spacy.io) ## 🌟 Features - 🔐 **End-to-End Encryption** using NaCl (XSalsa20-Poly1305) - 📊 **99.94% F-Score** NER model accuracy - 📄 **PDF & Image Support** with intelligent text extraction - 🖼️ **Embedded Image Extraction** from medical PDFs - 🎯 **Entity Detection**: ANATOMY & OBSERVATION - ⚠️ **Critical Finding Detection** - 💊 **Clinical Recommendations** - 📦 **Gzip Compression** (25% bandwidth savings) - ⚡ **EasyOCR Integration** for scanned documents ## 🏗️ Architecture ```markdown Client Application ↓ (Encrypt + Compress) FastAPI Server ↓ (Decrypt + Decompress) Text Extraction (PyMuPDF/EasyOCR) ↓ spaCy NER Model (99.94% F-score) ↓ Post-Processing \& Analysis ↓ (Encrypt) Structured JSON Response ``` ## 📡 API Endpoints ### `POST /analyze-secure` Secure encrypted endpoint for medical document analysis. **Request Format:** ```json { "ciphertext": "base64_encrypted_data", "nonce": "base64_nonce" } ``` **Encrypted Payload Structure:** ```json { "filename": "report.pdf", "file_data": "base64_encoded_file", "file_type": "pdf" } ``` **Response Format:** ```json { "status": "success", "ciphertext": "base64_encrypted_response", "nonce": "base64_nonce" } ``` **Decrypted Response Structure:** ```json { "status": "success", "processing_time": 57.721, "filename": "xray_report.pdf", "input_type": "pdf", "ocr_used": true, "ocr_engine": "EasyOCR", "raw_text": "Patient report...", "text_length": 1022, "entities": [ { "text": "lung", "label": "ANATOMY", "start": 45, "end": 49, "confidence": 0.998 } ], "images": [ { "page": 1, "format": "JPEG", "width": 800, "height": 600, "data": "data:image/jpeg;base64,..." } ], "structured_report": { "anatomy": ["lung", "heart", "chest"], "all_observations": ["clear", "normal"], "positive_findings": [], "negative_findings": ["clear", "normal"], "critical_findings": [] }, "summary": { "total_entities": 12, "anatomy_count": 6, "observations_count": 6, "has_critical_findings": false, "has_abnormalities": false }, "recommendations": [ "No significant abnormalities detected" ] } ``` ### `GET /health` Health check endpoint. **Response:** ```json { "status": "healthy", "model_loaded": true, "model_pipeline": ["tok2vec", "ner"], "model_labels": ["ANATOMY", "OBSERVATION"], "ocr_engine": "EasyOCR", "encryption": "NaCl (XSalsa20-Poly1305)", "compression": "gzip", "version": "1.0.0" } ``` ## 🔒 Security ### Encryption Details - **Algorithm**: NaCl (Networking and Cryptography library) - **Cipher**: XSalsa20 stream cipher - **Authentication**: Poly1305 MAC - **Key Derivation**: PBKDF2 with SHA-256 - **Nonce**: 24 bytes (randomly generated per request) ### Compression - **Algorithm**: gzip - **Average Savings**: 25-30% bandwidth reduction - **Applied**: Before encryption on client, after decryption on server ### Data Flow 1. Client compresses payload with gzip 2. Client encrypts compressed data with NaCl 3. Server decrypts and decompresses 4. Server processes medical document 5. Server encrypts response 6. Client decrypts response ## 🚀 Deployment ### HuggingFace Spaces This API is deployed on HuggingFace Spaces using Docker. ### Local Development ```sh # Clone repository git clone cd radiology-ner-api # Create virtual environment python -m venv venv source venv/bin/activate \# On Windows: venv\Scripts\activate # Install dependencies pip install -r requirements.txt # Download spaCy model python -m spacy download en_core_web_sm # Run server uvicorn app.main:app --host 0.0.0.0 --port 7860 ``` ### Environment Variables ```text ENCRYPTION_KEY=your-secret-encryption-key-min-32-chars MODEL_PATH=./models/xray_ner_best HOST=0.0.0.0 PORT=7860 ``` ## 📊 Model Performance | Metric | Score | |--------|-------| | **F-Score** | 99.94% | | **Precision** | 99.92% | | **Recall** | 99.96% | | **Training Samples** | 2,674 reports | | **Entity Types** | 2 (ANATOMY, OBSERVATION) | ### Training Data - **Dataset**: Indiana University Chest X-Ray Collection - **Reports**: 2,674 radiology reports - **Annotations**: Manual entity labeling - **Framework**: spaCy v3.7 - **Architecture**: HashEmbedCNN ## 🛠️ Technology Stack - **Backend**: FastAPI 0.104.1 - **NER**: spaCy 3.7 - **OCR**: EasyOCR - **PDF Processing**: PyMuPDF (fitz) - **Image Processing**: OpenCV, Pillow - **Encryption**: PyNaCl - **Compression**: gzip - **Deployment**: Docker, HuggingFace Spaces ## 📝 Client Implementation Example ```python import base64 import gzip import json import requests from nacl.secret import SecretBox from nacl.utils import random SECRET_KEY = "your-encryption-key" def encrypt_file(file_path, file_type): \# Read file with open(file_path, 'rb') as f: file_data = base64.b64encode(f.read()).decode() # Create payload payload = { "filename": file_path.split('/')[-1], "file_data": file_data, "file_type": file_type } # Compress compressed = gzip.compress(json.dumps(payload).encode()) compressed_b64 = base64.b64encode(compressed).decode() # Encrypt key = derive_key(SECRET_KEY) box = SecretBox(key) nonce = random(SecretBox.NONCE_SIZE) encrypted = box.encrypt(compressed_b64.encode(), nonce) return { "ciphertext": base64.b64encode(encrypted[24:]).decode(), "nonce": base64.b64encode(nonce).decode() } # Send request encrypted_payload = encrypt_file("report.pdf", "pdf") response = requests.post( "https://your-space.hf.space/analyze-secure", json=encrypted_payload ) # Decrypt response result = decrypt_response(response.json(), SECRET_KEY) print(result) ``` ## 🎯 Use Cases - **Clinical Decision Support**: Extract structured data from radiology reports - **Medical Record Digitization**: OCR for scanned medical documents - **Research Analytics**: Automated entity extraction for medical research - **DPDPA Compliance**: Secure processing of sensitive medical data (Digital Personal Data Protection Act, 2023) - **Telemedicine**: Remote radiology report analysis under Telemedicine Practice Guidelines - **Healthcare AI Research**: Supporting AI/ML research in Indian healthcare sector ## 📄 License & Compliance **License**: Apache License 2.0 - see LICENSE file for details **Indian Regulations Compliance**: - **Digital Personal Data Protection Act (DPDPA), 2023**: End-to-end encryption ensures data protection - **Information Technology Act, 2000**: Secure data transmission and storage - **Telemedicine Practice Guidelines, 2020**: Compliant with MCI telemedicine regulations - **Clinical Establishments Act, 2010**: Suitable for registered clinical establishments **Note**: Users must ensure compliance with: - State-specific medical data regulations - Medical Council of India (MCI) guidelines - National Health Authority (NHA) standards for digital health - Ayushman Bharat Digital Mission (ABDM) integration requirements ## ⚠️ Disclaimer This API is designed for research, educational, and assistive purposes only. It is compliant with Indian digital health regulations including DPDPA 2023 and Telemedicine Practice Guidelines. **Important Notice**: - This tool does **NOT** replace professional medical diagnosis or treatment - All analysis must be reviewed and validated by qualified medical practitioners registered with Medical Council of India (MCI) or State Medical Councils - Users must comply with Indian medical data protection laws (DPDPA 2023) - Healthcare providers must maintain proper patient consent as per Indian regulations - For clinical use, ensure compliance with Clinical Establishments Act and relevant state laws Always consult qualified and registered healthcare professionals for medical advice and diagnosis. **Built with ❤️ for the medical AI community**