Spaces:

SPerva
/

pillchecker-staging

Sleeping

App Files Files Community

pillchecker-staging / README.md

SPerva

Sync Space from GitHub d7dc11e48bdcfa20f260c53294e3274066a08536

fa380b2 verified 20 days ago

preview code

raw

history blame contribute delete

7.1 kB

metadata

title: PillChecker Staging
emoji: 📚
colorFrom: green
colorTo: indigo
sdk: docker
pinned: true
app_port: 8000
license: mit

PillChecker API

PillChecker helps users find out if two medications are safe to take at the same time. This repository contains the backend API that identifies drugs from OCR text and checks for dangerous interactions using DDInter 2.0 with OpenFDA fallback evidence.

MEDICAL DISCLAIMER

This service is provided for informational and self-educational purposes only. While the application utilizes data from respected pharmaceutical sources, the information provided should not be treated as medical advice, diagnosis, or treatment.

The developer of this project does not have any medical qualifications. This tool was built as a technical exercise to explore NLP and medical data integration.

Always consult with a qualified healthcare professional (such as a doctor or pharmacist) before making any decisions regarding your medications or health. The developer assumes no responsibility or liability for any errors, omissions, or consequences arising from the use of the information provided by this service.

Architecture

Drug Identification

Converts unstructured OCR text into standardized drug records using a multi-step strategy:

OCR Cleaning: The ocr_cleaner normalizes common OCR artifacts before NER: digit-letter confusion (0/o, 1/l), rn→m in drug names, ligatures, invisible characters, and whitespace.
NER: The OpenMed-NER-PharmaDetect-BioPatient-108M model (108M parameters) extracts chemical entity names from the cleaned text.
Fallback: If NER yields no results, an approximate term search via the RxNorm REST API catches brand names (e.g., "Advil" -> ibuprofen).
Enrichment: A regex parser extracts dosages (e.g., "400 mg"), and the RxNorm API maps every identified drug to its RxCUI for standardized downstream lookups.
Confidence: Results with NER score below 0.85 or sourced from the RxNorm fallback are flagged with needs_confirmation = true to prompt user verification.

Interaction Checking

Drug-drug interactions are resolved against a pinned DDInter 2.0 SQLite database with OpenFDA fallback evidence:

DDInter SQLite client: app/clients/ddinter_db.py opens the pre-built DDInter SQLite database with aiosqlite and FTS5 search.
RxCUI-first lookup: For each drug pair, the checker resolves RxCUIs through RxNorm, maps them to DDInter IDs, and falls back to DDInter name search when needed.
OpenFDA fallback: If DDInter has no pair match, the checker searches FDA label interaction text and classifies severity with the DeBERTa v3 zero-shot classifier.
Coverage summary: /interactions reports whether each checked pair resolved through DDInter, OpenFDA, or neither source.

Transparency

Both /analyze and /interactions responses include:

data_sources: which models and databases were used for the result
limitations (interactions only): scope disclaimers about what the system does and does not cover

Docker Build

The image uses a three-stage build to keep layers small and reproducible:

Stage 1 (Python): uv installs Python dependencies into an isolated venv.
Stage 2 (DB downloader): the pinned DDInter SQLite database is downloaded from the explicitly configured GitHub Releases source.
Stage 3 (Runtime): combines the venv, SQLite database, and app code. NER and severity models are pre-downloaded so the image is fully self-contained.

API Endpoints

Method	Path	Auth	Description
`GET`	`/health`	No	Liveness check
`GET`	`/health/data`	No	Readiness -- confirms DDInter SQLite connection
`POST`	`/analyze`	API key	Extract drugs from OCR text
`POST`	`/interactions`	API key	Check interactions for a list of drug names
`POST`	`/admin/cache/clear`	API key	Clear all in-memory caches

Eval Benchmark

The benchmark suite and raw results have been migrated to the Hugging Face Hub for better reproducibility and visualization.

Benchmark Dataset: SPerva/pillchecker-ner-benchmark
Result History: hf://buckets/SPerva/pillchecker-experiments
Methodology: See the dataset card on Hugging Face for details on the current 500-case benchmark sample.

Pipeline (Clean Text)	Precision	Recall	F1
Bare NER Baseline	46.9%	84.4%	60.3%
Full Pipeline	71.6%	81.0%	76.0%

See eval/README.md for evaluation methodology and progress, and AGENTS.md for HF/GitHub ownership and cleanup rules.

Staging & Deployment

The API is deployed as a staging environment on Hugging Face Spaces for remote testing:

Staging Space: sperva-pillchecker-staging
API Docs: sperva-pillchecker-staging.hf.space/docs

PillChecker Collection -- Central hub for all models and datasets used in this project.
OpenMed NER PharmaDetect -- drug entity recognition model (108M params). License: Apache 2.0
RxNorm REST API -- drug name normalization and RxCUI mapping. Provided by NLM (free to use).
DDInter 2.0 -- drug-drug interaction dataset accessed through a pinned SQLite database configured with INTERACTION_DB_REPO and INTERACTION_DB_TAG during Docker builds.
OpenFDA Drug Label API -- fallback evidence source for interaction text when DDInter has no pair match.
DeBERTa-v3-base-mnli-fever-anli -- zero-shot classifier for interaction severity. License: MIT
Hugging Face Transformers -- NLP pipeline library. License: Apache 2.0

Citation

If you use this software or the benchmark dataset in your research, please cite it as:

@software{perekrestova_pillchecker_2026,
  author = {Perekrestova, Svetlana},
  orcid = {0009-0003-2905-6040},
  title = {PillChecker API: Pharmaceutical Entity Extraction and Interaction Checker},
  version = {1.2.2},
  doi = {10.5281/zenodo.19792062},
  url = {https://github.com/SPerekrestova/pillchecker-api},
  date = {2026-04-26},
  publisher = {Zenodo},
  note = {GitHub Repository}
}