Spaces:
Sleeping
Sleeping
| title: Radiology Report NER API | |
| emoji: π©Ί | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: docker | |
| pinned: false | |
| license: apache-2.0 | |
| # π©Ί Radiology Report NER API | |
| **Secure, encrypted medical document analysis with Named Entity Recognition and OCR** | |
| [](https://opensource.org/licenses/Apache-2.0) | |
| [](https://www.python.org/downloads/release/python-3110/) | |
| [](https://fastapi.tiangolo.com) | |
| [](https://spacy.io) | |
| ## π Features | |
| - π **End-to-End Encryption** using NaCl (XSalsa20-Poly1305) | |
| - π **99.94% F-Score** NER model accuracy | |
| - π **PDF & Image Support** with intelligent text extraction | |
| - πΌοΈ **Embedded Image Extraction** from medical PDFs | |
| - π― **Entity Detection**: ANATOMY & OBSERVATION | |
| - β οΈ **Critical Finding Detection** | |
| - π **Clinical Recommendations** | |
| - π¦ **Gzip Compression** (25% bandwidth savings) | |
| - β‘ **EasyOCR Integration** for scanned documents | |
| ## ποΈ Architecture | |
| ```markdown | |
| Client Application | |
| β (Encrypt + Compress) | |
| FastAPI Server | |
| β (Decrypt + Decompress) | |
| Text Extraction (PyMuPDF/EasyOCR) | |
| β | |
| spaCy NER Model (99.94% F-score) | |
| β | |
| Post-Processing \& Analysis | |
| β (Encrypt) | |
| Structured JSON Response | |
| ``` | |
| ## π‘ API Endpoints | |
| ### `POST /analyze-secure` | |
| Secure encrypted endpoint for medical document analysis. | |
| **Request Format:** | |
| ```json | |
| { | |
| "ciphertext": "base64_encrypted_data", | |
| "nonce": "base64_nonce" | |
| } | |
| ``` | |
| **Encrypted Payload Structure:** | |
| ```json | |
| { | |
| "filename": "report.pdf", | |
| "file_data": "base64_encoded_file", | |
| "file_type": "pdf" | |
| } | |
| ``` | |
| **Response Format:** | |
| ```json | |
| { | |
| "status": "success", | |
| "ciphertext": "base64_encrypted_response", | |
| "nonce": "base64_nonce" | |
| } | |
| ``` | |
| **Decrypted Response Structure:** | |
| ```json | |
| { | |
| "status": "success", | |
| "processing_time": 57.721, | |
| "filename": "xray_report.pdf", | |
| "input_type": "pdf", | |
| "ocr_used": true, | |
| "ocr_engine": "EasyOCR", | |
| "raw_text": "Patient report...", | |
| "text_length": 1022, | |
| "entities": [ | |
| { | |
| "text": "lung", | |
| "label": "ANATOMY", | |
| "start": 45, | |
| "end": 49, | |
| "confidence": 0.998 | |
| } | |
| ], | |
| "images": [ | |
| { | |
| "page": 1, | |
| "format": "JPEG", | |
| "width": 800, | |
| "height": 600, | |
| "data": "data:image/jpeg;base64,..." | |
| } | |
| ], | |
| "structured_report": { | |
| "anatomy": ["lung", "heart", "chest"], | |
| "all_observations": ["clear", "normal"], | |
| "positive_findings": [], | |
| "negative_findings": ["clear", "normal"], | |
| "critical_findings": [] | |
| }, | |
| "summary": { | |
| "total_entities": 12, | |
| "anatomy_count": 6, | |
| "observations_count": 6, | |
| "has_critical_findings": false, | |
| "has_abnormalities": false | |
| }, | |
| "recommendations": [ | |
| "No significant abnormalities detected" | |
| ] | |
| } | |
| ``` | |
| ### `GET /health` | |
| Health check endpoint. | |
| **Response:** | |
| ```json | |
| { | |
| "status": "healthy", | |
| "model_loaded": true, | |
| "model_pipeline": ["tok2vec", "ner"], | |
| "model_labels": ["ANATOMY", "OBSERVATION"], | |
| "ocr_engine": "EasyOCR", | |
| "encryption": "NaCl (XSalsa20-Poly1305)", | |
| "compression": "gzip", | |
| "version": "1.0.0" | |
| } | |
| ``` | |
| ## π Security | |
| ### Encryption Details | |
| - **Algorithm**: NaCl (Networking and Cryptography library) | |
| - **Cipher**: XSalsa20 stream cipher | |
| - **Authentication**: Poly1305 MAC | |
| - **Key Derivation**: PBKDF2 with SHA-256 | |
| - **Nonce**: 24 bytes (randomly generated per request) | |
| ### Compression | |
| - **Algorithm**: gzip | |
| - **Average Savings**: 25-30% bandwidth reduction | |
| - **Applied**: Before encryption on client, after decryption on server | |
| ### Data Flow | |
| 1. Client compresses payload with gzip | |
| 2. Client encrypts compressed data with NaCl | |
| 3. Server decrypts and decompresses | |
| 4. Server processes medical document | |
| 5. Server encrypts response | |
| 6. Client decrypts response | |
| ## π Deployment | |
| ### HuggingFace Spaces | |
| This API is deployed on HuggingFace Spaces using Docker. | |
| ### Local Development | |
| ```sh | |
| # Clone repository | |
| git clone <your-repo-url> | |
| cd radiology-ner-api | |
| # Create virtual environment | |
| python -m venv venv | |
| source venv/bin/activate \# On Windows: venv\Scripts\activate | |
| # Install dependencies | |
| pip install -r requirements.txt | |
| # Download spaCy model | |
| python -m spacy download en_core_web_sm | |
| # Run server | |
| uvicorn app.main:app --host 0.0.0.0 --port 7860 | |
| ``` | |
| ### Environment Variables | |
| ```text | |
| ENCRYPTION_KEY=your-secret-encryption-key-min-32-chars | |
| MODEL_PATH=./models/xray_ner_best | |
| HOST=0.0.0.0 | |
| PORT=7860 | |
| ``` | |
| ## π Model Performance | |
| | Metric | Score | | |
| |--------|-------| | |
| | **F-Score** | 99.94% | | |
| | **Precision** | 99.92% | | |
| | **Recall** | 99.96% | | |
| | **Training Samples** | 2,674 reports | | |
| | **Entity Types** | 2 (ANATOMY, OBSERVATION) | | |
| ### Training Data | |
| - **Dataset**: Indiana University Chest X-Ray Collection | |
| - **Reports**: 2,674 radiology reports | |
| - **Annotations**: Manual entity labeling | |
| - **Framework**: spaCy v3.7 | |
| - **Architecture**: HashEmbedCNN | |
| ## π οΈ Technology Stack | |
| - **Backend**: FastAPI 0.104.1 | |
| - **NER**: spaCy 3.7 | |
| - **OCR**: EasyOCR | |
| - **PDF Processing**: PyMuPDF (fitz) | |
| - **Image Processing**: OpenCV, Pillow | |
| - **Encryption**: PyNaCl | |
| - **Compression**: gzip | |
| - **Deployment**: Docker, HuggingFace Spaces | |
| ## π Client Implementation Example | |
| ```python | |
| import base64 | |
| import gzip | |
| import json | |
| import requests | |
| from nacl.secret import SecretBox | |
| from nacl.utils import random | |
| SECRET_KEY = "your-encryption-key" | |
| def encrypt_file(file_path, file_type): | |
| \# Read file | |
| with open(file_path, 'rb') as f: | |
| file_data = base64.b64encode(f.read()).decode() | |
| # Create payload | |
| payload = { | |
| "filename": file_path.split('/')[-1], | |
| "file_data": file_data, | |
| "file_type": file_type | |
| } | |
| # Compress | |
| compressed = gzip.compress(json.dumps(payload).encode()) | |
| compressed_b64 = base64.b64encode(compressed).decode() | |
| # Encrypt | |
| key = derive_key(SECRET_KEY) | |
| box = SecretBox(key) | |
| nonce = random(SecretBox.NONCE_SIZE) | |
| encrypted = box.encrypt(compressed_b64.encode(), nonce) | |
| return { | |
| "ciphertext": base64.b64encode(encrypted[24:]).decode(), | |
| "nonce": base64.b64encode(nonce).decode() | |
| } | |
| # Send request | |
| encrypted_payload = encrypt_file("report.pdf", "pdf") | |
| response = requests.post( | |
| "https://your-space.hf.space/analyze-secure", | |
| json=encrypted_payload | |
| ) | |
| # Decrypt response | |
| result = decrypt_response(response.json(), SECRET_KEY) | |
| print(result) | |
| ``` | |
| ## π― Use Cases | |
| - **Clinical Decision Support**: Extract structured data from radiology reports | |
| - **Medical Record Digitization**: OCR for scanned medical documents | |
| - **Research Analytics**: Automated entity extraction for medical research | |
| - **DPDPA Compliance**: Secure processing of sensitive medical data (Digital Personal Data Protection Act, 2023) | |
| - **Telemedicine**: Remote radiology report analysis under Telemedicine Practice Guidelines | |
| - **Healthcare AI Research**: Supporting AI/ML research in Indian healthcare sector | |
| ## π License & Compliance | |
| **License**: Apache License 2.0 - see LICENSE file for details | |
| **Indian Regulations Compliance**: | |
| - **Digital Personal Data Protection Act (DPDPA), 2023**: End-to-end encryption ensures data protection | |
| - **Information Technology Act, 2000**: Secure data transmission and storage | |
| - **Telemedicine Practice Guidelines, 2020**: Compliant with MCI telemedicine regulations | |
| - **Clinical Establishments Act, 2010**: Suitable for registered clinical establishments | |
| **Note**: Users must ensure compliance with: | |
| - State-specific medical data regulations | |
| - Medical Council of India (MCI) guidelines | |
| - National Health Authority (NHA) standards for digital health | |
| - Ayushman Bharat Digital Mission (ABDM) integration requirements | |
| ## β οΈ Disclaimer | |
| This API is designed for research, educational, and assistive purposes only. It is compliant with Indian digital health regulations including DPDPA 2023 and Telemedicine Practice Guidelines. | |
| **Important Notice**: | |
| - This tool does **NOT** replace professional medical diagnosis or treatment | |
| - All analysis must be reviewed and validated by qualified medical practitioners registered with Medical Council of India (MCI) or State Medical Councils | |
| - Users must comply with Indian medical data protection laws (DPDPA 2023) | |
| - Healthcare providers must maintain proper patient consent as per Indian regulations | |
| - For clinical use, ensure compliance with Clinical Establishments Act and relevant state laws | |
| Always consult qualified and registered healthcare professionals for medical advice and diagnosis. | |
| **Built with β€οΈ for the medical AI community** | |