HyperClinical / README.md
salmasoma
Fix Gemma3 hidden-size handling and add built-in example NIfTI
a19ac32
metadata
title: HyperClinical
emoji: 🧠
colorFrom: red
colorTo: yellow
sdk: docker
app_port: 8501
tags:
  - streamlit
  - medical-imaging
  - dementia
pinned: false
short_description: HyperClinical multimodal dementia subclassification demo

HyperClinical Space

Inference-only Streamlit demo for:

  • AVRA atrophy scoring (MTA, PA, GCA-F)
  • MedGemma-style clinical report generation
  • Final dementia subclass prediction + confidence

UI updates:

  • No sidebar; settings are in an in-page Advanced Runtime Settings expander.
  • EHR table uses the requested clinical field names (including cardiovascular comorbidity columns).
  • Embedded pipeline figure shown at top of interface.
  • MRI input supports both file upload and public URL.
  • MRI input supports file upload, public URL, and a built-in example NIfTI.
    • If Hugging Face upload fails with AxiosError 403, use URL mode.

Solving the Large Checkpoint Push Issue

Do not store large checkpoints directly in this Space repo.

Use a separate HF repo for assets (recommended: model repo), and let this app download assets at runtime. In this Space, assets are preloaded on server startup before Streamlit launches.

Expected assets layout in the assets repo

checkpoints/neurofusion/best_model.pt
checkpoints/neurofusion/preprocessing_stats.json
src/inference_core/weights/mta/model_1.pth.tar ... model_5.pth.tar
src/inference_core/weights/pa/model_1.pth.tar ... model_5.pth.tar
src/inference_core/weights/gca-f/model_1.pth.tar ... model_5.pth.tar
src/assets/Hyperclinical_Pipeline.jpg

Runtime config

Set these in Space Variables/Secrets:

  • HF_ASSETS_REPO_ID (example: SalmaHassan/HyperClinical-assets)
  • HF_ASSETS_REVISION (optional, default main)
  • HF_TOKEN (needed if assets repo is private)
  • HF_PRELOAD_ASSETS_ON_START (optional, default 1)
  • HF_ASSETS_REQUIRED_ON_START (optional, default 1, fail startup if assets missing)
  • HF_ASSETS_FORCE_DOWNLOAD_ON_START (optional, default 0)
  • HF_MEDSIGLIP_MODEL_ID (optional override for image embedding extractor model)
  • HF_MEDGEMMA_MODEL_ID (optional override for clinical embedding extractor model)
  • HF_REQUIRE_FOUNDATION_MODELS (optional, default 0; if 1, fail inference unless true HF models load)
  • HF_ALLOW_BIOCLINICAL_FALLBACK (optional, default 0; if 1, allow BioClinicalBERT fallback when MedGemma is unavailable)
  • HF_CACHE_FOUNDATION_MODELS (optional, default 0; keep MedSigLIP/MedGemma loaded in memory between requests)

The container startup script preloads required checkpoints and AVRA weights from that repo. If sync fails or checkpoints appear missing, use Validate HF Asset Layout in the UI to list missing required files directly from the remote assets repo.

Server-only checkpoint mode

  • No local checkpoint files are required on your machine.
  • Space startup handles checkpoint loading on the server.
  • Inference uses:
    • checkpoints/neurofusion/best_model.pt
    • checkpoints/neurofusion/preprocessing_stats.json
    • src/inference_core/weights/* (for AVRA)

One-time upload of assets (from your local project)

From this Space repo directory:

# login first
huggingface-cli login

# optional: create assets repo once
huggingface-cli repo create SalmaHassan/HyperClinical-assets --type model

# publish required files from local source
bash scripts/publish_assets_from_local.sh SalmaHassan/HyperClinical-assets ../avra_public-master/HyperClinical_Challenge

App Entry

  • Streamlit app: src/streamlit_app.py
  • Built-in sample MRI: src/examples/example_case.nii.gz

Local Run

pip install -r requirements.txt
streamlit run src/streamlit_app.py

Dummy Backend Test

python run_dummy.py

Notes

  • src/inference_core/ contains AVRA inference code only (no training).
  • src/demo_backend/ contains model loading, feature prep, report generation, and full inference pipeline.
  • Checkpoints directories are intentionally mostly empty in git to keep pushes lightweight.
  • Binary assets (checkpoints/figures) are intentionally downloaded at runtime instead of committed to Space git.
  • Docker entrypoint preloads assets server-side via python -m demo_backend.bootstrap_assets before starting Streamlit.
  • During inference, the app attempts HF-based MedSigLIP + MedGemma embedding extraction and feeds those embeddings into the classifier path.
  • Runtime output folder is ./outputs/ (gitignored).

Troubleshooting

  • If clinical narrative says Medical history: none reported, ensure health-history inputs are provided as numeric/binary values (1 for present, 0 for absent). The app also accepts yes/no.
  • If foundation_embeddings.medgemma shows fallback, check:
    • transformers version (google/medgemma-* requires Gemma3 support, use transformers>=4.57.1),
    • HF_TOKEN availability/permissions (if model access is gated),
    • runtime memory limits on your Space hardware.