Spaces:
Sleeping
Sleeping
File size: 5,516 Bytes
e20bb93 abcf84d e20bb93 87b6610 d6ba1de 87b6610 d6ba1de 87b6610 d6ba1de 87b6610 d6ba1de 87b6610 d6ba1de 87b6610 d6ba1de 87b6610 d6ba1de 87b6610 d6ba1de 87b6610 d6ba1de 87b6610 d6ba1de 87b6610 d6ba1de 87b6610 d6ba1de 87b6610 d6ba1de 87b6610 d6ba1de 87b6610 d6ba1de 87b6610 d6ba1de 87b6610 d6ba1de 87b6610 d6ba1de 87b6610 d6ba1de 87b6610 d6ba1de 87b6610 d6ba1de 87b6610 d6ba1de 87b6610 d6ba1de 87b6610 d6ba1de 87b6610 d6ba1de 87b6610 d6ba1de 87b6610 d6ba1de 87b6610 d6ba1de 87b6610 d6ba1de 87b6610 d6ba1de 87b6610 d6ba1de 87b6610 d6ba1de 87b6610 d6ba1de 87b6610 d6ba1de 87b6610 d6ba1de 87b6610 d6ba1de 87b6610 d6ba1de 87b6610 d6ba1de 87b6610 d6ba1de 87b6610 d6ba1de 87b6610 d6ba1de 87b6610 d6ba1de 87b6610 d6ba1de 87b6610 d6ba1de |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 |
---
title: Resource Optimization ML Pipeline
emoji: π₯
colorFrom: blue
colorTo: green
sdk: docker
sdk_version: latest
app_file: app.py
pinned: false
---
# Resource Optimization ML Pipeline
A data-driven approach to optimizing service placement across cloud regions, reducing latency and infrastructure costs through machine learning.
**Live Dashboard:** https://huggingface.co/spaces/aankitdas/resource-optimization-ml
## Problem
When Amazon scales infrastructure globally across multiple AWS regions, teams face a critical decision: which services should run in which regions?
The naive approach (random placement) is inefficient:
- Services get placed in expensive regions unnecessarily
- Cross-region communication adds latency
- Over-provisioning of resources to ensure redundancy
- No data-driven strategy for placement decisions
This project tackles that problem: **given service characteristics and regional latency patterns, can we predict optimal placement that reduces latency and costs?**
## Solution
I built an ML-powered recommendation system that:
1. **Analyzes service characteristics** - memory, CPU, traffic volume, latency sensitivity
2. **Models regional latency** - how long it takes to communicate between regions
3. **Predicts placement impact** - what happens to latency if we place a service in region X vs Y
4. **Compares strategies** - random placement vs ML-optimized placement through A/B testing
## Results
The ML-optimized strategy outperforms random placement:
- **5.25% latency reduction** - services respond faster to users
- **4.92% cost savings** - avoided expensive regions where possible
- **9.30% improvement for critical services** - latency-sensitive workloads benefit most
- **Statistical significance** - improvements are not due to chance (p < 0.001)
- **16% fewer placements** - more efficient resource usage
## Technical Approach
### Data Pipeline
- Generated 150 synthetic services with realistic attributes
- Created 1.6M+ traffic records across 5 regions over 90 days
- Modeled cross-region latency patterns based on real AWS geography
- Stored everything in SQLite for easy SQL querying
### Machine Learning
**Model 1: Latency Prediction (XGBoost Regressor)**
- Predicts service latency given placement characteristics
- Input: service memory/CPU, traffic patterns, outbound latency, dependencies
- Output: expected latency in milliseconds
- Performance: RMSE=28.7ms
**Model 2: Placement Strategy (Random Forest Classifier)**
- Determines if a service should be single-region or multi-region
- Input: traffic volume, dependencies, resource requirements
- Output: optimal placement strategy
- Performance: 100% accuracy on test set
### A/B Testing
To validate the ML approach:
- **Control**: randomly place services across 2-4 regions
- **Treatment**: use ML models to recommend optimal placement
- **Test**: independent t-test on latency samples (t=7.02, p<0.001)
- **Conclusion**: ML strategy is statistically significantly better
## How to Use the Dashboard
**Overview** - See service distribution across memory tiers and latency sensitivity. Top services by traffic volume.
**A/B Test Results** - The core finding. Side-by-side comparison of random vs ML-optimized placement with metrics and statistical test results.
**Regional Analysis** - Latency heatmap showing communication costs between regions. Higher latency regions are avoided when possible.
## Project Structure
```
βββ data_generation.py # Generate synthetic services, traffic, latency data
βββ setup_database.py # Load CSVs into SQLite
βββ train_models.py # Train XGBoost and Random Forest models
βββ ab_test_simulation.py # Run A/B test and save results
βββ app.py # Streamlit dashboard
βββ results/
β βββ ab_test_results.json # A/B test metrics and statistics
βββ requirements.txt # Python dependencies
```
## Technology Stack
- **Data Processing**: Python, Pandas, NumPy, SQLite
- **Machine Learning**: scikit-learn, XGBoost
- **Statistics**: SciPy (hypothesis testing)
- **Visualization**: Plotly, Streamlit
- **Deployment**: Docker, Hugging Face Spaces, GitHub Actions
## Key Insights
1. **Traffic patterns matter most** - Services with high, variable traffic benefit most from multi-region placement
2. **Latency-critical services are placement-sensitive** - A few milliseconds of additional latency can degrade user experience for these workloads
3. **Regional cost differences are significant** - Some regions are 80% more expensive than others. ML avoids them when latency permits
4. **Efficiency and performance can both improve** - ML uses fewer total placements while reducing latency
5. **Statistical rigor matters** - Raw improvements mean nothing without significance testing
## Running Locally
```bash
# Generate data
python data_generation.py
# Setup database
python setup_database.py
# Train models
python train_models.py
# Run A/B test
python ab_test_simulation.py
# Launch dashboard
streamlit run app.py
```
## What This Demonstrates
- SQL data analysis and aggregation
- Python data manipulation and feature engineering
- Machine learning model training and evaluation
- Statistical hypothesis testing and A/B testing methodology
- End-to-end data product development (from data to dashboard)
- Production deployment with Docker and GitHub Actions
## Repository
https://github.com/aankitdas/resource-optimization-ml |