VibecoderMcSwaggins's picture
feat(data): integrate HuggingFace dataset as primary data source (#11)
363ba14 unverified
|
raw
history blame
1.43 kB

Data Directory

This folder is for local development data only. The primary data source is HuggingFace.

Data Source

Primary: hugging-science/isles24-stroke

The dataset is automatically downloaded and cached by HuggingFace when you run:

from stroke_deepisles_demo.data import load_isles_dataset

# Loads from HuggingFace (default)
dataset = load_isles_dataset()

# Access cases
case = dataset.get_case(0)  # or dataset.get_case("sub-stroke0001")

HuggingFace Cache Location

Data is cached at: ~/.cache/huggingface/datasets/hugging-science___isles24-stroke/

Dataset Contents

149 acute ischemic stroke cases with:

  • Imaging: DWI, ADC, CT, CTA, perfusion maps (tmax, mtt, cbf, cbv)
  • Masks: lesion_mask, lvo_mask, cow_segmentation
  • Clinical: age, sex, nihss_admission, mrs_admission, mrs_3month

Local Development (Optional)

For offline development, you can still use a local directory:

dataset = load_isles_dataset("path/to/local/data", local_mode=True)

Expected structure for local mode:

data/
β”œβ”€β”€ Images-DWI/     # DWI volumes
β”œβ”€β”€ Images-ADC/     # ADC maps
└── Masks/          # Ground truth lesion masks

Notes

  • All data files are gitignored
  • On HuggingFace Spaces, data loads automatically from the HF cache
  • See dataset card for citation requirements