Spaces:
Running
Running
| title: README | |
| emoji: π | |
| colorFrom: indigo | |
| colorTo: purple | |
| sdk: static | |
| pinned: false | |
| # Tuva Project: Open-Source Healthcare Modeling | |
| Welcome to the Tuva ML Models Hub β an open-source ecosystem for healthcare risk prediction, cost benchmarking, and expected value modeling. | |
| --- | |
| ## Mission | |
| The Tuva Project is dedicated to democratizing healthcare knowledge. | |
| We believe that access to robust models should not be locked behind paywalls or proprietary systems. | |
| These models are typically: | |
| - Expensive to build and maintain | |
| - Trained on complex healthcare data | |
| - Essential for policy, research, and actuarial strategy | |
| By open-sourcing these tools, we empower health systems, researchers, and startups to build with transparency and scale with trust. | |
| --- | |
| ## What You'll Find Here | |
| This hub is a growing library of machine learning models designed to support: | |
| - Cost prediction | |
| - Encounter forecasting | |
| - Risk stratification | |
| - Benchmarking for Medicare, Medicaid, and commercial populations | |
| Each model includes: | |
| - Trained model artifacts (e.g., `.pkl`, `.joblib`) | |
| - Scripts for running predictions | |
| - Complete documentation and evaluation metrics | |
| --- | |
| ## Quick Start: End-to-End Workflow | |
| This section provides high-level instructions for running a model with the Tuva Project. The workflow involves preparing benchmark data using dbt, running a Python prediction script, and optionally ingesting the results back into dbt for analysis. | |
| ### 1. Configure Your dbt Project | |
| You need to enable the correct variables in your `dbt_project.yml` file to control the workflow. | |
| #### A. Enable Benchmark Marts | |
| These two variables control which parts of the Tuva Project are active. They are `false` by default. | |
| ```yaml | |
| # in dbt_project.yml | |
| vars: | |
| benchmarks_train: true | |
| benchmarks_already_created: true | |
| ``` | |
| - `benchmarks_train`: Set to `true` to build the datasets that the ML models will use for making predictions. | |
| - `benchmarks_already_created`: Set to `true` to ingest model predictions back into the project as a new dbt source. | |
| #### B. (Optional) Set Prediction Source Locations | |
| If you plan to bring predictions back into dbt for analysis, you must define where dbt can find the prediction data. | |
| ```yaml | |
| # in dbt_project.yml | |
| vars: | |
| predictions_person_year: "{{ source('benchmark_output', 'person_year') }}" | |
| predictions_inpatient: "{{ source('benchmark_output', 'inpatient') }}" | |
| predictions_inpatient_prospective: "{{ source('benchmark_output', 'inpatient_predictions_prospective') }}" | |
| predictions_person_year_prospective: "{{ source('benchmark_output', 'pmpm_predictions_prospective') }}" | |
| ``` | |
| #### C. Configure `sources.yml` | |
| Ensure your `sources.yml` file includes a definition for the source you referenced above (e.g., `benchmark_output`) that points to the database and schema where your model's prediction outputs are stored. | |
| --- | |
| ### 2. The 3-Step Run Process | |
| This workflow can be managed by any orchestration tool (e.g., Airflow, Prefect, Fabric Notebooks) or run manually from the command line. | |
| #### Step 1: Generate the Training & Benchmarking Data | |
| Run the Tuva Project with `benchmarks_train` enabled. This creates the input data required by the ML model. | |
| ```bash | |
| dbt build --vars '{benchmarks_train: true}' | |
| ``` | |
| To run only the benchmark mart: | |
| ```bash | |
| dbt build --select tag:benchmarks_train --vars '{benchmarks_train: true}' | |
| ``` | |
| #### Step 2: Run the Prediction Python Code | |
| Execute the Python script to generate predictions. This script will read the data created in Step 1 and write the prediction outputs to a persistent location (e.g., a table in your data warehouse). | |
| *We have provided example Snowflake Notebook code within each model's repository that was used in Tuva's environment.* | |
| #### Step 3: (Optional) Analyze Predictions in dbt | |
| To bring the predictions back into the Tuva Project for analysis, run dbt again with `benchmarks_already_created` enabled. This populates the analytics marts. | |
| ```bash | |
| dbt build --vars '{benchmarks_already_created: true, benchmarks_train: false}' | |
| ``` | |
| To run only the analysis models: | |
| ```bash | |
| dbt build --select tag:benchmarks_analysis --vars '{benchmarks_already_created: true, benchmarks_train: false}' | |
| ``` | |
| --- | |
| ## Current Focus: Medicare (CMS) | |
| Our initial models use de-identified CMS data to calculate: | |
| - Expected values for paid amounts and encounter counts at the member-year level | |
| - Readmission rate | |
| - Discharge location | |
| - Length of stay | |
| Models like the **Encounter Cost Prediction Model** are trained on the 2022/23 Medicare Standard Analytic Files (SAF), using standardized preprocessing and evaluation pipelines. | |
| --- | |
| ## What's Next | |
| We are expanding to include: | |
| - Commercial claims models (e.g., ESI, employer-based populations) | |
| - Medicaid utilization and cost models | |
| --- | |
| ## Contribute | |
| This hub is open to community contributions. | |
| If you're working on a healthcare machine learning model and want to share it: | |
| 1. Fork one of our repositories | |
| 2. Upload your trained model and code | |
| 3. Document your inputs, outputs, and evaluation | |
| 4. Open a pull request or reach out to our team | |
| We believe risk modeling should be open infrastructure. | |
| Help us build a future where healthcare knowledge is free and shared. | |