Spaces:

kbsss
/

Heart-Attack-Risk-Rate

Sleeping

App Files Files Community

Heart-Attack-Risk-Rate / QUICK_START.md

Kasilanka Bhoopesh Siva Srikar

Complete Heart Attack Risk Prediction App - Ready for Deployment

08123aa 3 months ago

preview code

raw

history blame contribute delete

3.9 kB

	# Quick Start Guide: Model Improvement

	## Overview

	This guide helps you improve your heart attack risk prediction models using advanced optimization techniques.

	## 🐳 Docker Option (Recommended)

	If you have Docker installed, this is the easiest way to run optimization:

	```bash
	# Simple one-command execution
	./run_optimization_docker.sh

	# Or with custom settings
	./run_optimization_docker.sh --trials 50

	# Run feature analysis
	./run_optimization_docker.sh --script feature_importance_analysis.py
	```

	See [DOCKER_OPTIMIZATION.md](DOCKER_OPTIMIZATION.md) for detailed Docker instructions.

	---

	## Local Installation Option

	## Current Performance

	Your current models achieve:
	- Accuracy: ~85.1%
	- Recall: ~84.3%
	- ROC-AUC: ~92.5%

	## Quick Start (3 Steps)

	### Step 1: Install Dependencies

	```bash
	pip install -r requirements.txt
	```

	This will install Optuna and other required packages.

	### Step 2: Run Model Optimization

	```bash
	python improve_models.py
	```

	What this does:
	- Optimizes hyperparameters for XGBoost, CatBoost, and LightGBM using Optuna
	- Finds optimal prediction thresholds for each model
	- Optimizes ensemble weights
	- Saves improved models to `content/models/`

	Time: ~1-2 hours (100 trials per model)

	Output:
	- `XGBoost_optimized.joblib`
	- `CatBoost_optimized.joblib`
	- `LightGBM_optimized.joblib`
	- `model_metrics_optimized.csv`
	- `ensemble_info_optimized.json`
	- `best_params_optimized.json`

	### Step 3: Analyze Feature Importance (Optional)

	```bash
	python feature_importance_analysis.py
	```

	What this does:
	- Analyzes feature importance across all models
	- Performs statistical feature selection
	- Generates visualizations
	- Provides feature selection recommendations

	Time: ~5-10 minutes

	Output:
	- `feature_selection_recommendations.json`
	- `feature_importance_top30.png`
	- `feature_correlation_top30.png`

	### Step 4: Compare Results

	```bash
	python compare_models.py
	```

	What this does:
	- Compares baseline vs optimized models
	- Shows improvement metrics
	- Displays optimal ensemble configuration

	## Expected Improvements

	After running the optimization:

	\| Metric \| Current \| Expected \| Improvement \|
	\|--------\|---------\|----------\|-------------\|
	\| Accuracy \| 85.1% \| 86-87% \| +1-2% \|
	\| Recall \| 84.3% \| 86-87.5% \| +2-4% \|
	\| F1 Score \| 85.0% \| 86-87% \| +1-2% \|

	## Key Improvements Implemented

	1. ✅ Optuna Hyperparameter Optimization
	- Tree-structured Parzen Estimator (TPE)
	- 100+ trials per model
	- Expanded parameter search spaces

	2. ✅ Multi-Objective Optimization
	- Combined accuracy + recall scoring
	- Threshold optimization per model

	3. ✅ Enhanced Ensemble
	- Three-model ensemble (XGBoost + CatBoost + LightGBM)
	- Optimized weights
	- Optimized threshold

	4. ✅ Feature Analysis
	- Importance extraction
	- Statistical selection methods
	- Recommendations for feature engineering

	## Faster Alternative

	If you want faster results (less optimal but quicker):

	Edit `improve_models.py` and change:
	```python
	n_trials = 100 # Change to 30-50 for faster results
	```

	## Troubleshooting

	Problem: Script takes too long
	- Solution: Reduce `n_trials` to 30-50

	Problem: Memory errors
	- Solution: Reduce `n_jobs` or use smaller data sample

	Problem: No improvement
	- Solution: Check data preprocessing matches training data

	## Next Steps

	1. Run optimization scripts
	2. Compare results with baseline
	3. Test optimized models on validation set
	4. Deploy best performing model
	5. Monitor performance

	## Files Created

	- `improve_models.py` - Main optimization script
	- `feature_importance_analysis.py` - Feature analysis
	- `compare_models.py` - Comparison tool
	- `IMPROVEMENTS.md` - Detailed improvement analysis
	- `QUICK_START.md` - This guide

	## Questions?

	See `IMPROVEMENTS.md` for detailed explanations of all improvements.