Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available: 6.11.0
Facility Data – Data Story (for memo)
Source
- Dataset: N-SUMHSS (National Substance Use and Mental Health Services Survey) / National Directory of Drug and Alcohol Use Treatment Facilities. This is the same data that powers FindTreatment.gov.
- Where: SAMHSA CBHSQ Data – N-SUMHSS data files (SAS/CSV). National Directory also available as Excel/PDF from National Directories.
- Processing: For development and demo,
facilities.csvmay be a small subset. To use all data from FindTreatment.gov, run:pip install -r requirements.txtthenpython scripts/download_findtreatment_data.py. That script downloads the official SAMHSA National Directory (same data as FindTreatment.gov) and buildsdata/facilities.csv. Alternatively, download the Excel/CSV from SAMHSA yourself and runpython scripts/ingest_facilities.py path/to/file.xlsx -o data/facilities.csv. The ingest script maps source columns to the internal schema; see the script and N-SUMHSS codebook for variable mapping.
Using the full CSV on Hugging Face Spaces
The full facilities.csv is too large to push in the Space repo. To use it on a Space:
- Create a Hugging Face Dataset (not the Space repo): go to huggingface.co/datasets, click “Create new dataset”, name it e.g.
samhsa-facilities, and make it public. - Upload the CSV — either run
python scripts/upload_facilities_to_hf.py YOUR_USERNAME/samhsa-facilitiesfrom the repo root (uses create from CSV +push_to_hub), or in the dataset repo use Files and versions → Add file and uploaddata/facilities.csv. - In your Space: open the Space repo → Settings → Repository secrets or Variables (or in the Space’s “App” tab, Variable and secrets). Add a variable: name
FACILITIES_DATASET, valueYOUR_HF_USERNAME/samhsa-facilities(the dataset repo id). - Redeploy the Space. The app will load the full facilities table from the Dataset on startup (one-time download, then cached). This works on the free tier.
If data/facilities.csv was already committed, you must remove it from git history (not just the index) or Hugging Face will still reject the push. In your repo root:
# Remove the file from all commits (rewrites history)
git filter-branch --force --index-filter 'git rm --cached --ignore-unmatch data/facilities.csv' --prune-empty HEAD
# Clean up refs and gc
rm -rf .git/refs/original/
git reflog expire --expire=now --all && git gc --prune=now --aggressive
# Force push (if you already pushed this branch)
git push --force
Then create the dataset (see below) and set FACILITIES_DATASET on the Space.
Scope
- Geography: Sample includes facilities in MA (Boston area), TX, CA, IL. Full N-SUMHSS covers all states.
- Attributes: Facility name, address, city, state, zip, phone; treatment type (inpatient, outpatient, residential, telehealth); payment options (Medicaid/MassHealth, insurance, sliding scale, free, VA); MAT (medication-assisted treatment); services; substances addressed (e.g. alcohol, opioids); languages; populations (e.g. adults, adolescents, veterans, LGBTQ+, pregnant women); description. The chatbot helps users describe their situation and filters by these attributes.
Limitations
- Data as of survey/publication date; facility details (phone, hours, availability) may have changed. Always confirm with the provider or findtreatment.gov before making decisions.