Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
|
@@ -6,6 +6,7 @@ colorTo: purple
|
|
| 6 |
sdk: static
|
| 7 |
pinned: false
|
| 8 |
---
|
|
|
|
| 9 |
# Tuva Project: Open-Source Healthcare Modeling
|
| 10 |
|
| 11 |
Welcome to the Tuva ML Models Hub — an open-source ecosystem for healthcare risk prediction, cost benchmarking, and expected value modeling.
|
|
@@ -18,9 +19,10 @@ The Tuva Project is dedicated to democratizing healthcare knowledge.
|
|
| 18 |
We believe that access to robust models should not be locked behind paywalls or proprietary systems.
|
| 19 |
|
| 20 |
These models are typically:
|
| 21 |
-
|
| 22 |
-
-
|
| 23 |
-
-
|
|
|
|
| 24 |
|
| 25 |
By open-sourcing these tools, we empower health systems, researchers, and startups to build with transparency and scale with trust.
|
| 26 |
|
|
@@ -30,16 +32,95 @@ By open-sourcing these tools, we empower health systems, researchers, and startu
|
|
| 30 |
|
| 31 |
This hub is a growing library of machine learning models designed to support:
|
| 32 |
|
| 33 |
-
- Cost prediction
|
| 34 |
-
- Encounter forecasting
|
| 35 |
-
- Risk stratification
|
| 36 |
-
- Benchmarking for Medicare, Medicaid, and commercial populations
|
| 37 |
|
| 38 |
Each model includes:
|
| 39 |
|
| 40 |
-
- Trained model artifacts (e.g., `.pkl`, `.joblib`)
|
| 41 |
-
- Scripts for running predictions
|
| 42 |
-
- Complete documentation and evaluation metrics
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43 |
|
| 44 |
---
|
| 45 |
|
|
@@ -47,13 +128,13 @@ Each model includes:
|
|
| 47 |
|
| 48 |
Our initial models use de-identified CMS data to calculate:
|
| 49 |
|
| 50 |
-
- Expected values for paid amounts and encounter counts at the member-year level
|
| 51 |
-
- Readmission rate
|
| 52 |
-
- Discharge location
|
| 53 |
-
- Length of stay
|
| 54 |
-
|
| 55 |
|
| 56 |
Models like the **Encounter Cost Prediction Model** are trained on the 2020 Medicare Standard Analytic Files (SAF), using standardized preprocessing and evaluation pipelines.
|
|
|
|
| 57 |
Models trained on 2022 and 2023 data are coming soon.
|
| 58 |
|
| 59 |
---
|
|
@@ -62,8 +143,8 @@ Models trained on 2022 and 2023 data are coming soon.
|
|
| 62 |
|
| 63 |
We are expanding to include:
|
| 64 |
|
| 65 |
-
- Commercial claims models (e.g., ESI, employer-based populations)
|
| 66 |
-
- Medicaid utilization and cost models
|
| 67 |
|
| 68 |
---
|
| 69 |
|
|
@@ -73,12 +154,10 @@ This hub is open to community contributions.
|
|
| 73 |
|
| 74 |
If you're working on a healthcare machine learning model and want to share it:
|
| 75 |
|
| 76 |
-
1. Fork one of our repositories
|
| 77 |
-
2. Upload your trained model and code
|
| 78 |
-
3. Document your inputs, outputs, and evaluation
|
| 79 |
-
4. Open a pull request or reach out to our team
|
| 80 |
-
|
| 81 |
-
|
| 82 |
|
| 83 |
We believe risk modeling should be open infrastructure.
|
| 84 |
Help us build a future where healthcare knowledge is free and shared.
|
|
|
|
| 6 |
sdk: static
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
+
|
| 10 |
# Tuva Project: Open-Source Healthcare Modeling
|
| 11 |
|
| 12 |
Welcome to the Tuva ML Models Hub — an open-source ecosystem for healthcare risk prediction, cost benchmarking, and expected value modeling.
|
|
|
|
| 19 |
We believe that access to robust models should not be locked behind paywalls or proprietary systems.
|
| 20 |
|
| 21 |
These models are typically:
|
| 22 |
+
|
| 23 |
+
- Expensive to build and maintain
|
| 24 |
+
- Trained on complex healthcare data
|
| 25 |
+
- Essential for policy, research, and actuarial strategy
|
| 26 |
|
| 27 |
By open-sourcing these tools, we empower health systems, researchers, and startups to build with transparency and scale with trust.
|
| 28 |
|
|
|
|
| 32 |
|
| 33 |
This hub is a growing library of machine learning models designed to support:
|
| 34 |
|
| 35 |
+
- Cost prediction
|
| 36 |
+
- Encounter forecasting
|
| 37 |
+
- Risk stratification
|
| 38 |
+
- Benchmarking for Medicare, Medicaid, and commercial populations
|
| 39 |
|
| 40 |
Each model includes:
|
| 41 |
|
| 42 |
+
- Trained model artifacts (e.g., `.pkl`, `.joblib`)
|
| 43 |
+
- Scripts for running predictions
|
| 44 |
+
- Complete documentation and evaluation metrics
|
| 45 |
+
|
| 46 |
+
---
|
| 47 |
+
|
| 48 |
+
## Quick Start: End-to-End Workflow
|
| 49 |
+
|
| 50 |
+
This section provides high-level instructions for running a model with the Tuva Project. The workflow involves preparing benchmark data using dbt, running a Python prediction script, and optionally ingesting the results back into dbt for analysis.
|
| 51 |
+
|
| 52 |
+
### 1. Configure Your dbt Project
|
| 53 |
+
|
| 54 |
+
You need to enable the correct variables in your `dbt_project.yml` file to control the workflow.
|
| 55 |
+
|
| 56 |
+
#### A. Enable Benchmark Marts
|
| 57 |
+
|
| 58 |
+
These two variables control which parts of the Tuva Project are active. They are `false` by default.
|
| 59 |
+
|
| 60 |
+
```yaml
|
| 61 |
+
# in dbt_project.yml
|
| 62 |
+
vars:
|
| 63 |
+
benchmarks_train: true
|
| 64 |
+
benchmarks_already_created: true
|
| 65 |
+
```
|
| 66 |
+
|
| 67 |
+
- `benchmarks_train`: Set to `true` to build the datasets that the ML models will use for making predictions.
|
| 68 |
+
- `benchmarks_already_created`: Set to `true` to ingest model predictions back into the project as a new dbt source.
|
| 69 |
+
|
| 70 |
+
#### B. (Optional) Set Prediction Source Locations
|
| 71 |
+
|
| 72 |
+
If you plan to bring predictions back into dbt for analysis, you must define where dbt can find the prediction data.
|
| 73 |
+
|
| 74 |
+
```yaml
|
| 75 |
+
# in dbt_project.yml
|
| 76 |
+
vars:
|
| 77 |
+
predictions_person_year: "{{ source('benchmark_output', 'person_year') }}"
|
| 78 |
+
predictions_inpatient: "{{ source('benchmark_output', 'inpatient') }}"
|
| 79 |
+
```
|
| 80 |
+
|
| 81 |
+
#### C. Configure `sources.yml`
|
| 82 |
+
|
| 83 |
+
Ensure your `sources.yml` file includes a definition for the source you referenced above (e.g., `benchmark_output`) that points to the database and schema where your model's prediction outputs are stored.
|
| 84 |
+
|
| 85 |
+
---
|
| 86 |
+
|
| 87 |
+
### 2. The 3-Step Run Process
|
| 88 |
+
|
| 89 |
+
This workflow can be managed by any orchestration tool (e.g., Airflow, Prefect, Fabric Notebooks) or run manually from the command line.
|
| 90 |
+
|
| 91 |
+
#### Step 1: Generate the Training & Benchmarking Data
|
| 92 |
+
|
| 93 |
+
Run the Tuva Project with `benchmarks_train` enabled. This creates the input data required by the ML model.
|
| 94 |
+
|
| 95 |
+
```bash
|
| 96 |
+
dbt build --vars '{benchmarks_train: true}'
|
| 97 |
+
```
|
| 98 |
+
|
| 99 |
+
To run only the benchmark mart:
|
| 100 |
+
|
| 101 |
+
```bash
|
| 102 |
+
dbt build --select tag:benchmarks_train --vars '{benchmarks_train: true}'
|
| 103 |
+
```
|
| 104 |
+
|
| 105 |
+
#### Step 2: Run the Prediction Python Code
|
| 106 |
+
|
| 107 |
+
Execute the Python script to generate predictions. This script will read the data created in Step 1 and write the prediction outputs to a persistent location (e.g., a table in your data warehouse).
|
| 108 |
+
|
| 109 |
+
*We have provided example Snowflake Notebook code within each model's repository that was used in Tuva's environment.*
|
| 110 |
+
|
| 111 |
+
#### Step 3: (Optional) Analyze Predictions in dbt
|
| 112 |
+
|
| 113 |
+
To bring the predictions back into the Tuva Project for analysis, run dbt again with `benchmarks_already_created` enabled. This populates the analytics marts.
|
| 114 |
+
|
| 115 |
+
```bash
|
| 116 |
+
dbt build --vars '{benchmarks_already_created: true, benchmarks_train: false}'
|
| 117 |
+
```
|
| 118 |
+
|
| 119 |
+
To run only the analysis models:
|
| 120 |
+
|
| 121 |
+
```bash
|
| 122 |
+
dbt build --select tag:benchmarks_analysis --vars '{benchmarks_already_created: true, benchmarks_train: false}'
|
| 123 |
+
```
|
| 124 |
|
| 125 |
---
|
| 126 |
|
|
|
|
| 128 |
|
| 129 |
Our initial models use de-identified CMS data to calculate:
|
| 130 |
|
| 131 |
+
- Expected values for paid amounts and encounter counts at the member-year level
|
| 132 |
+
- Readmission rate
|
| 133 |
+
- Discharge location
|
| 134 |
+
- Length of stay
|
|
|
|
| 135 |
|
| 136 |
Models like the **Encounter Cost Prediction Model** are trained on the 2020 Medicare Standard Analytic Files (SAF), using standardized preprocessing and evaluation pipelines.
|
| 137 |
+
|
| 138 |
Models trained on 2022 and 2023 data are coming soon.
|
| 139 |
|
| 140 |
---
|
|
|
|
| 143 |
|
| 144 |
We are expanding to include:
|
| 145 |
|
| 146 |
+
- Commercial claims models (e.g., ESI, employer-based populations)
|
| 147 |
+
- Medicaid utilization and cost models
|
| 148 |
|
| 149 |
---
|
| 150 |
|
|
|
|
| 154 |
|
| 155 |
If you're working on a healthcare machine learning model and want to share it:
|
| 156 |
|
| 157 |
+
1. Fork one of our repositories
|
| 158 |
+
2. Upload your trained model and code
|
| 159 |
+
3. Document your inputs, outputs, and evaluation
|
| 160 |
+
4. Open a pull request or reach out to our team
|
|
|
|
|
|
|
| 161 |
|
| 162 |
We believe risk modeling should be open infrastructure.
|
| 163 |
Help us build a future where healthcare knowledge is free and shared.
|