Spaces:

tuva-ml-models
/

README

Running

App Files Files Community

README / README.md

bradtuva

Update README.md

4cb34d3 verified 5 months ago

preview code

raw

history blame contribute delete

5.27 kB

	---
	title: README
	emoji: 📉
	colorFrom: indigo
	colorTo: purple
	sdk: static
	pinned: false
	---

	# Tuva Project: Open-Source Healthcare Modeling

	Welcome to the Tuva ML Models Hub — an open-source ecosystem for healthcare risk prediction, cost benchmarking, and expected value modeling.

	---

	## Mission

	The Tuva Project is dedicated to democratizing healthcare knowledge.
	We believe that access to robust models should not be locked behind paywalls or proprietary systems.

	These models are typically:

	- Expensive to build and maintain
	- Trained on complex healthcare data
	- Essential for policy, research, and actuarial strategy

	By open-sourcing these tools, we empower health systems, researchers, and startups to build with transparency and scale with trust.

	---

	## What You'll Find Here

	This hub is a growing library of machine learning models designed to support:

	- Cost prediction
	- Encounter forecasting
	- Risk stratification
	- Benchmarking for Medicare, Medicaid, and commercial populations

	Each model includes:

	- Trained model artifacts (e.g., `.pkl`, `.joblib`)
	- Scripts for running predictions
	- Complete documentation and evaluation metrics

	---

	## Quick Start: End-to-End Workflow

	This section provides high-level instructions for running a model with the Tuva Project. The workflow involves preparing benchmark data using dbt, running a Python prediction script, and optionally ingesting the results back into dbt for analysis.

	### 1. Configure Your dbt Project

	You need to enable the correct variables in your `dbt_project.yml` file to control the workflow.

	#### A. Enable Benchmark Marts

	These two variables control which parts of the Tuva Project are active. They are `false` by default.

	```yaml
	# in dbt_project.yml
	vars:
	benchmarks_train: true
	benchmarks_already_created: true
	```

	- `benchmarks_train`: Set to `true` to build the datasets that the ML models will use for making predictions.
	- `benchmarks_already_created`: Set to `true` to ingest model predictions back into the project as a new dbt source.

	#### B. (Optional) Set Prediction Source Locations

	If you plan to bring predictions back into dbt for analysis, you must define where dbt can find the prediction data.

	```yaml
	# in dbt_project.yml
	vars:
	predictions_person_year: "{{ source('benchmark_output', 'person_year') }}"
	predictions_inpatient: "{{ source('benchmark_output', 'inpatient') }}"
	predictions_inpatient_prospective: "{{ source('benchmark_output', 'inpatient_predictions_prospective') }}"
	predictions_person_year_prospective: "{{ source('benchmark_output', 'pmpm_predictions_prospective') }}"
	```

	#### C. Configure `sources.yml`

	Ensure your `sources.yml` file includes a definition for the source you referenced above (e.g., `benchmark_output`) that points to the database and schema where your model's prediction outputs are stored.

	---

	### 2. The 3-Step Run Process

	This workflow can be managed by any orchestration tool (e.g., Airflow, Prefect, Fabric Notebooks) or run manually from the command line.

	#### Step 1: Generate the Training & Benchmarking Data

	Run the Tuva Project with `benchmarks_train` enabled. This creates the input data required by the ML model.

	```bash
	dbt build --vars '{benchmarks_train: true}'
	```

	To run only the benchmark mart:

	```bash
	dbt build --select tag:benchmarks_train --vars '{benchmarks_train: true}'
	```

	#### Step 2: Run the Prediction Python Code

	Execute the Python script to generate predictions. This script will read the data created in Step 1 and write the prediction outputs to a persistent location (e.g., a table in your data warehouse).

	We have provided example Snowflake Notebook code within each model's repository that was used in Tuva's environment.

	#### Step 3: (Optional) Analyze Predictions in dbt

	To bring the predictions back into the Tuva Project for analysis, run dbt again with `benchmarks_already_created` enabled. This populates the analytics marts.

	```bash
	dbt build --vars '{benchmarks_already_created: true, benchmarks_train: false}'
	```

	To run only the analysis models:

	```bash
	dbt build --select tag:benchmarks_analysis --vars '{benchmarks_already_created: true, benchmarks_train: false}'
	```

	---

	## Current Focus: Medicare (CMS)

	Our initial models use de-identified CMS data to calculate:

	- Expected values for paid amounts and encounter counts at the member-year level
	- Readmission rate
	- Discharge location
	- Length of stay

	Models like the Encounter Cost Prediction Model are trained on the 2022/23 Medicare Standard Analytic Files (SAF), using standardized preprocessing and evaluation pipelines.


	---

	## What's Next

	We are expanding to include:

	- Commercial claims models (e.g., ESI, employer-based populations)
	- Medicaid utilization and cost models

	---

	## Contribute

	This hub is open to community contributions.

	If you're working on a healthcare machine learning model and want to share it:

	1. Fork one of our repositories
	2. Upload your trained model and code
	3. Document your inputs, outputs, and evaluation
	4. Open a pull request or reach out to our team

	We believe risk modeling should be open infrastructure.
	Help us build a future where healthcare knowledge is free and shared.