Spaces:
Running
Running
File size: 5,271 Bytes
eaf98a6 62ec3bf 74b1d03 eaf98a6 74b1d03 015ae94 74b1d03 62ec3bf 74b1d03 62ec3bf 74b1d03 62ec3bf 1f79fc9 62ec3bf 74b1d03 015ae94 62ec3bf 74b1d03 4cb34d3 62ec3bf 74b1d03 62ec3bf 74b1d03 62ec3bf 74b1d03 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 |
---
title: README
emoji: 📉
colorFrom: indigo
colorTo: purple
sdk: static
pinned: false
---
# Tuva Project: Open-Source Healthcare Modeling
Welcome to the Tuva ML Models Hub — an open-source ecosystem for healthcare risk prediction, cost benchmarking, and expected value modeling.
---
## Mission
The Tuva Project is dedicated to democratizing healthcare knowledge.
We believe that access to robust models should not be locked behind paywalls or proprietary systems.
These models are typically:
- Expensive to build and maintain
- Trained on complex healthcare data
- Essential for policy, research, and actuarial strategy
By open-sourcing these tools, we empower health systems, researchers, and startups to build with transparency and scale with trust.
---
## What You'll Find Here
This hub is a growing library of machine learning models designed to support:
- Cost prediction
- Encounter forecasting
- Risk stratification
- Benchmarking for Medicare, Medicaid, and commercial populations
Each model includes:
- Trained model artifacts (e.g., `.pkl`, `.joblib`)
- Scripts for running predictions
- Complete documentation and evaluation metrics
---
## Quick Start: End-to-End Workflow
This section provides high-level instructions for running a model with the Tuva Project. The workflow involves preparing benchmark data using dbt, running a Python prediction script, and optionally ingesting the results back into dbt for analysis.
### 1. Configure Your dbt Project
You need to enable the correct variables in your `dbt_project.yml` file to control the workflow.
#### A. Enable Benchmark Marts
These two variables control which parts of the Tuva Project are active. They are `false` by default.
```yaml
# in dbt_project.yml
vars:
benchmarks_train: true
benchmarks_already_created: true
```
- `benchmarks_train`: Set to `true` to build the datasets that the ML models will use for making predictions.
- `benchmarks_already_created`: Set to `true` to ingest model predictions back into the project as a new dbt source.
#### B. (Optional) Set Prediction Source Locations
If you plan to bring predictions back into dbt for analysis, you must define where dbt can find the prediction data.
```yaml
# in dbt_project.yml
vars:
predictions_person_year: "{{ source('benchmark_output', 'person_year') }}"
predictions_inpatient: "{{ source('benchmark_output', 'inpatient') }}"
predictions_inpatient_prospective: "{{ source('benchmark_output', 'inpatient_predictions_prospective') }}"
predictions_person_year_prospective: "{{ source('benchmark_output', 'pmpm_predictions_prospective') }}"
```
#### C. Configure `sources.yml`
Ensure your `sources.yml` file includes a definition for the source you referenced above (e.g., `benchmark_output`) that points to the database and schema where your model's prediction outputs are stored.
---
### 2. The 3-Step Run Process
This workflow can be managed by any orchestration tool (e.g., Airflow, Prefect, Fabric Notebooks) or run manually from the command line.
#### Step 1: Generate the Training & Benchmarking Data
Run the Tuva Project with `benchmarks_train` enabled. This creates the input data required by the ML model.
```bash
dbt build --vars '{benchmarks_train: true}'
```
To run only the benchmark mart:
```bash
dbt build --select tag:benchmarks_train --vars '{benchmarks_train: true}'
```
#### Step 2: Run the Prediction Python Code
Execute the Python script to generate predictions. This script will read the data created in Step 1 and write the prediction outputs to a persistent location (e.g., a table in your data warehouse).
*We have provided example Snowflake Notebook code within each model's repository that was used in Tuva's environment.*
#### Step 3: (Optional) Analyze Predictions in dbt
To bring the predictions back into the Tuva Project for analysis, run dbt again with `benchmarks_already_created` enabled. This populates the analytics marts.
```bash
dbt build --vars '{benchmarks_already_created: true, benchmarks_train: false}'
```
To run only the analysis models:
```bash
dbt build --select tag:benchmarks_analysis --vars '{benchmarks_already_created: true, benchmarks_train: false}'
```
---
## Current Focus: Medicare (CMS)
Our initial models use de-identified CMS data to calculate:
- Expected values for paid amounts and encounter counts at the member-year level
- Readmission rate
- Discharge location
- Length of stay
Models like the **Encounter Cost Prediction Model** are trained on the 2022/23 Medicare Standard Analytic Files (SAF), using standardized preprocessing and evaluation pipelines.
---
## What's Next
We are expanding to include:
- Commercial claims models (e.g., ESI, employer-based populations)
- Medicaid utilization and cost models
---
## Contribute
This hub is open to community contributions.
If you're working on a healthcare machine learning model and want to share it:
1. Fork one of our repositories
2. Upload your trained model and code
3. Document your inputs, outputs, and evaluation
4. Open a pull request or reach out to our team
We believe risk modeling should be open infrastructure.
Help us build a future where healthcare knowledge is free and shared.
|