pillchecker-staging / README.md
SPerva's picture
Sync Space from GitHub d7dc11e48bdcfa20f260c53294e3274066a08536
fa380b2 verified
metadata
title: PillChecker Staging
emoji: 📚
colorFrom: green
colorTo: indigo
sdk: docker
pinned: true
app_port: 8000
license: mit

PillChecker API

PillChecker helps users find out if two medications are safe to take at the same time. This repository contains the backend API that identifies drugs from OCR text and checks for dangerous interactions using DDInter 2.0 with OpenFDA fallback evidence.

DOI

MEDICAL DISCLAIMER

This service is provided for informational and self-educational purposes only. While the application utilizes data from respected pharmaceutical sources, the information provided should not be treated as medical advice, diagnosis, or treatment.

The developer of this project does not have any medical qualifications. This tool was built as a technical exercise to explore NLP and medical data integration.

Always consult with a qualified healthcare professional (such as a doctor or pharmacist) before making any decisions regarding your medications or health. The developer assumes no responsibility or liability for any errors, omissions, or consequences arising from the use of the information provided by this service.

Architecture

Drug Identification

Converts unstructured OCR text into standardized drug records using a multi-step strategy:

  1. OCR Cleaning: The ocr_cleaner normalizes common OCR artifacts before NER: digit-letter confusion (0/o, 1/l), rnm in drug names, ligatures, invisible characters, and whitespace.
  2. NER: The OpenMed-NER-PharmaDetect-BioPatient-108M model (108M parameters) extracts chemical entity names from the cleaned text.
  3. Fallback: If NER yields no results, an approximate term search via the RxNorm REST API catches brand names (e.g., "Advil" -> ibuprofen).
  4. Enrichment: A regex parser extracts dosages (e.g., "400 mg"), and the RxNorm API maps every identified drug to its RxCUI for standardized downstream lookups.
  5. Confidence: Results with NER score below 0.85 or sourced from the RxNorm fallback are flagged with needs_confirmation = true to prompt user verification.

Interaction Checking

Drug-drug interactions are resolved against a pinned DDInter 2.0 SQLite database with OpenFDA fallback evidence:

  1. DDInter SQLite client: app/clients/ddinter_db.py opens the pre-built DDInter SQLite database with aiosqlite and FTS5 search.
  2. RxCUI-first lookup: For each drug pair, the checker resolves RxCUIs through RxNorm, maps them to DDInter IDs, and falls back to DDInter name search when needed.
  3. OpenFDA fallback: If DDInter has no pair match, the checker searches FDA label interaction text and classifies severity with the DeBERTa v3 zero-shot classifier.
  4. Coverage summary: /interactions reports whether each checked pair resolved through DDInter, OpenFDA, or neither source.

Transparency

Both /analyze and /interactions responses include:

  • data_sources: which models and databases were used for the result
  • limitations (interactions only): scope disclaimers about what the system does and does not cover

Docker Build

The image uses a three-stage build to keep layers small and reproducible:

  • Stage 1 (Python): uv installs Python dependencies into an isolated venv.
  • Stage 2 (DB downloader): the pinned DDInter SQLite database is downloaded from the explicitly configured GitHub Releases source.
  • Stage 3 (Runtime): combines the venv, SQLite database, and app code. NER and severity models are pre-downloaded so the image is fully self-contained.

API Endpoints

Method Path Auth Description
GET /health No Liveness check
GET /health/data No Readiness -- confirms DDInter SQLite connection
POST /analyze API key Extract drugs from OCR text
POST /interactions API key Check interactions for a list of drug names
POST /admin/cache/clear API key Clear all in-memory caches

Eval Benchmark

The benchmark suite and raw results have been migrated to the Hugging Face Hub for better reproducibility and visualization.

Pipeline (Clean Text) Precision Recall F1
Bare NER Baseline 46.9% 84.4% 60.3%
Full Pipeline 71.6% 81.0% 76.0%

See eval/README.md for evaluation methodology and progress, and AGENTS.md for HF/GitHub ownership and cleanup rules.

Staging & Deployment

The API is deployed as a staging environment on Hugging Face Spaces for remote testing:

Citation

If you use this software or the benchmark dataset in your research, please cite it as:

@software{perekrestova_pillchecker_2026,
  author = {Perekrestova, Svetlana},
  orcid = {0009-0003-2905-6040},
  title = {PillChecker API: Pharmaceutical Entity Extraction and Interaction Checker},
  version = {1.2.2},
  doi = {10.5281/zenodo.19792062},
  url = {https://github.com/SPerekrestova/pillchecker-api},
  date = {2026-04-26},
  publisher = {Zenodo},
  note = {GitHub Repository}
}