radiology-api / README.md
MakPr016
Updated README and License
967ab53
metadata
title: Radiology Report NER API
emoji: 🩺
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: apache-2.0

🩺 Radiology Report NER API

Secure, encrypted medical document analysis with Named Entity Recognition and OCR

License Python 3.11 FastAPI spaCy

🌟 Features

  • πŸ” End-to-End Encryption using NaCl (XSalsa20-Poly1305)
  • πŸ“Š 99.94% F-Score NER model accuracy
  • πŸ“„ PDF & Image Support with intelligent text extraction
  • πŸ–ΌοΈ Embedded Image Extraction from medical PDFs
  • 🎯 Entity Detection: ANATOMY & OBSERVATION
  • ⚠️ Critical Finding Detection
  • πŸ’Š Clinical Recommendations
  • πŸ“¦ Gzip Compression (25% bandwidth savings)
  • ⚑ EasyOCR Integration for scanned documents

πŸ—οΈ Architecture


Client Application
    ↓ (Encrypt + Compress)
FastAPI Server
    ↓ (Decrypt + Decompress)
Text Extraction (PyMuPDF/EasyOCR)
    ↓
spaCy NER Model (99.94% F-score)
    ↓
Post-Processing \& Analysis
    ↓ (Encrypt)
Structured JSON Response

πŸ“‘ API Endpoints

POST /analyze-secure

Secure encrypted endpoint for medical document analysis.

Request Format:


{
    "ciphertext": "base64_encrypted_data",
    "nonce": "base64_nonce"
}

Encrypted Payload Structure:


{
    "filename": "report.pdf",
    "file_data": "base64_encoded_file",
    "file_type": "pdf"
}

Response Format:


{
    "status": "success",
    "ciphertext": "base64_encrypted_response",
    "nonce": "base64_nonce"
}

Decrypted Response Structure:


{
    "status": "success",
    "processing_time": 57.721,
    "filename": "xray_report.pdf",
    "input_type": "pdf",
    "ocr_used": true,
    "ocr_engine": "EasyOCR",
    "raw_text": "Patient report...",
    "text_length": 1022,
    "entities": [
    {
    "text": "lung",
    "label": "ANATOMY",
    "start": 45,
    "end": 49,
    "confidence": 0.998
    }
    ],
    "images": [
    {
    "page": 1,
    "format": "JPEG",
    "width": 800,
    "height": 600,
    "data": "data:image/jpeg;base64,..."
    }
    ],
    "structured_report": {
    "anatomy": ["lung", "heart", "chest"],
    "all_observations": ["clear", "normal"],
    "positive_findings": [],
    "negative_findings": ["clear", "normal"],
    "critical_findings": []
    },
    "summary": {
    "total_entities": 12,
    "anatomy_count": 6,
    "observations_count": 6,
    "has_critical_findings": false,
    "has_abnormalities": false
    },
    "recommendations": [
    "No significant abnormalities detected"
    ]
}

GET /health

Health check endpoint.

Response:


{
    "status": "healthy",
    "model_loaded": true,
    "model_pipeline": ["tok2vec", "ner"],
    "model_labels": ["ANATOMY", "OBSERVATION"],
    "ocr_engine": "EasyOCR",
    "encryption": "NaCl (XSalsa20-Poly1305)",
    "compression": "gzip",
    "version": "1.0.0"
}

πŸ”’ Security

Encryption Details

  • Algorithm: NaCl (Networking and Cryptography library)
  • Cipher: XSalsa20 stream cipher
  • Authentication: Poly1305 MAC
  • Key Derivation: PBKDF2 with SHA-256
  • Nonce: 24 bytes (randomly generated per request)

Compression

  • Algorithm: gzip
  • Average Savings: 25-30% bandwidth reduction
  • Applied: Before encryption on client, after decryption on server

Data Flow

  1. Client compresses payload with gzip
  2. Client encrypts compressed data with NaCl
  3. Server decrypts and decompresses
  4. Server processes medical document
  5. Server encrypts response
  6. Client decrypts response

πŸš€ Deployment

HuggingFace Spaces

This API is deployed on HuggingFace Spaces using Docker.

Local Development



# Clone repository

git clone <your-repo-url>
cd radiology-ner-api

# Create virtual environment

python -m venv venv
source venv/bin/activate  \# On Windows: venv\Scripts\activate

# Install dependencies

pip install -r requirements.txt

# Download spaCy model

python -m spacy download en_core_web_sm

# Run server

uvicorn app.main:app --host 0.0.0.0 --port 7860

Environment Variables


ENCRYPTION_KEY=your-secret-encryption-key-min-32-chars
MODEL_PATH=./models/xray_ner_best
HOST=0.0.0.0
PORT=7860

πŸ“Š Model Performance

Metric Score
F-Score 99.94%
Precision 99.92%
Recall 99.96%
Training Samples 2,674 reports
Entity Types 2 (ANATOMY, OBSERVATION)

Training Data

  • Dataset: Indiana University Chest X-Ray Collection
  • Reports: 2,674 radiology reports
  • Annotations: Manual entity labeling
  • Framework: spaCy v3.7
  • Architecture: HashEmbedCNN

πŸ› οΈ Technology Stack

  • Backend: FastAPI 0.104.1
  • NER: spaCy 3.7
  • OCR: EasyOCR
  • PDF Processing: PyMuPDF (fitz)
  • Image Processing: OpenCV, Pillow
  • Encryption: PyNaCl
  • Compression: gzip
  • Deployment: Docker, HuggingFace Spaces

πŸ“ Client Implementation Example


import base64
import gzip
import json
import requests
from nacl.secret import SecretBox
from nacl.utils import random

SECRET_KEY = "your-encryption-key"

def encrypt_file(file_path, file_type):
\# Read file
with open(file_path, 'rb') as f:
file_data = base64.b64encode(f.read()).decode()

    # Create payload
    payload = {
        "filename": file_path.split('/')[-1],
        "file_data": file_data,
        "file_type": file_type
    }
    
    # Compress
    compressed = gzip.compress(json.dumps(payload).encode())
    compressed_b64 = base64.b64encode(compressed).decode()
    
    # Encrypt
    key = derive_key(SECRET_KEY)
    box = SecretBox(key)
    nonce = random(SecretBox.NONCE_SIZE)
    encrypted = box.encrypt(compressed_b64.encode(), nonce)
    
    return {
        "ciphertext": base64.b64encode(encrypted[24:]).decode(),
        "nonce": base64.b64encode(nonce).decode()
    }
    
# Send request

encrypted_payload = encrypt_file("report.pdf", "pdf")
response = requests.post(
"https://your-space.hf.space/analyze-secure",
json=encrypted_payload
)

# Decrypt response

result = decrypt_response(response.json(), SECRET_KEY)
print(result)

🎯 Use Cases

  • Clinical Decision Support: Extract structured data from radiology reports
  • Medical Record Digitization: OCR for scanned medical documents
  • Research Analytics: Automated entity extraction for medical research
  • DPDPA Compliance: Secure processing of sensitive medical data (Digital Personal Data Protection Act, 2023)
  • Telemedicine: Remote radiology report analysis under Telemedicine Practice Guidelines
  • Healthcare AI Research: Supporting AI/ML research in Indian healthcare sector

πŸ“„ License & Compliance

License: Apache License 2.0 - see LICENSE file for details

Indian Regulations Compliance:

  • Digital Personal Data Protection Act (DPDPA), 2023: End-to-end encryption ensures data protection
  • Information Technology Act, 2000: Secure data transmission and storage
  • Telemedicine Practice Guidelines, 2020: Compliant with MCI telemedicine regulations
  • Clinical Establishments Act, 2010: Suitable for registered clinical establishments

Note: Users must ensure compliance with:

  • State-specific medical data regulations
  • Medical Council of India (MCI) guidelines
  • National Health Authority (NHA) standards for digital health
  • Ayushman Bharat Digital Mission (ABDM) integration requirements

⚠️ Disclaimer

This API is designed for research, educational, and assistive purposes only. It is compliant with Indian digital health regulations including DPDPA 2023 and Telemedicine Practice Guidelines.

Important Notice:

  • This tool does NOT replace professional medical diagnosis or treatment
  • All analysis must be reviewed and validated by qualified medical practitioners registered with Medical Council of India (MCI) or State Medical Councils
  • Users must comply with Indian medical data protection laws (DPDPA 2023)
  • Healthcare providers must maintain proper patient consent as per Indian regulations
  • For clinical use, ensure compliance with Clinical Establishments Act and relevant state laws

Always consult qualified and registered healthcare professionals for medical advice and diagnosis.

Built with ❀️ for the medical AI community