aankitdas commited on
Commit
87b6610
Β·
1 Parent(s): f8c2bb9

added readme

Browse files
Files changed (1) hide show
  1. README.md +268 -0
README.md CHANGED
@@ -0,0 +1,268 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # πŸš€ Resource Optimization ML Pipeline
2
+
3
+ An end-to-end machine learning solution for optimizing service placement across AWS regions, reducing latency and costs while maintaining reliability.
4
+
5
+ **Live Dashboard:** [View on Hugging Face Spaces](https://huggingface.co/spaces/YOUR_USERNAME/resource-optimization-ml)
6
+
7
+ ## πŸ“Š Project Overview
8
+
9
+ This project demonstrates a complete ML pipeline inspired by Amazon's Region Flexibility Engineering team challenges:
10
+
11
+ - **Problem:** Optimize service placement across 5 AWS regions to reduce latency and costs
12
+ - **Solution:** ML-driven placement strategy with A/B testing validation
13
+ - **Results:** 5.25% latency reduction, 4.92% cost savings, statistically significant (p < 0.001)
14
+
15
+ ## 🎯 Key Results
16
+
17
+ | Metric | Result |
18
+ |--------|--------|
19
+ | Latency Reduction | **5.25%** βœ… |
20
+ | Cost Savings | **4.92%** βœ… |
21
+ | Critical Service Improvement | **9.30%** βœ… |
22
+ | Statistical Significance | **p < 0.001** βœ… |
23
+ | Placement Efficiency | **378 vs 452 pairs** (-16%) |
24
+
25
+ ## πŸ› οΈ Architecture
26
+
27
+ ### Data Pipeline
28
+ - **150+ services** with metadata (memory, CPU, latency sensitivity)
29
+ - **1.6M+ traffic records** across 5 AWS regions
30
+ - **30K+ placement records** with latency and error rates
31
+ - **Regional latency matrix** for cross-region communication costs
32
+
33
+ ### ML Models
34
+
35
+ #### Model 1: Latency Prediction (XGBoost Regression)
36
+ - Predicts service latency for a given placement
37
+ - **Features:** Memory, CPU cores, traffic patterns, outbound latency, service dependencies
38
+ - **Performance:** RMSE = 28.7ms, MAE = 24.67ms
39
+ - **Top Features:** Request variability, outbound latency, average traffic
40
+
41
+ #### Model 2: Placement Strategy (Random Forest Classifier)
42
+ - Classifies services for optimal regional distribution
43
+ - **Features:** Traffic volume, dependencies, latency sensitivity, resource requirements
44
+ - **Performance:** 100% accuracy on test set
45
+
46
+ ### A/B Testing Framework
47
+ - **Control:** Random service placement (baseline)
48
+ - **Treatment:** ML-optimized placement using model predictions
49
+ - **Statistical Test:** Independent t-test (t=7.02, p<0.001)
50
+ - **Result:** Statistically significant improvement βœ…
51
+
52
+ ## πŸ“ Project Structure
53
+
54
+ ```
55
+ resource-optimization-ml/
56
+ β”œβ”€β”€ data/ # Generated datasets
57
+ β”‚ β”œβ”€β”€ services.csv # Service metadata
58
+ β”‚ β”œβ”€β”€ regional_latency.csv # Cross-region latency
59
+ β”‚ β”œβ”€β”€ traffic_patterns.csv # Hourly traffic by service/region
60
+ β”‚ └── service_placement.csv # Historical placements
61
+ β”‚
62
+ β”œβ”€β”€ models/ # Trained ML models
63
+ β”‚ β”œβ”€β”€ xgboost_latency_model.pkl # Latency prediction model
64
+ β”‚ β”œβ”€β”€ random_forest_placement_model.pkl # Placement strategy model
65
+ β”‚ β”œβ”€β”€ scaler_latency.pkl # Feature scaler
66
+ β”‚ β”œβ”€β”€ scaler_classification.pkl # Feature scaler
67
+ β”‚ └── feature_importance_*.csv # Feature importance analysis
68
+ β”‚
69
+ β”œβ”€β”€ results/ # A/B test results
70
+ β”‚ β”œβ”€β”€ ab_test_results.json # Statistical comparison
71
+ β”‚ β”œβ”€β”€ control_placement.csv # Control group placements
72
+ β”‚ └── treatment_placement.csv # Treatment group placements
73
+ β”‚
74
+ β”œβ”€β”€ notebooks/ # Analysis notebooks (optional)
75
+ β”‚
76
+ β”œβ”€β”€ data_generation.py # Generate synthetic dataset
77
+ β”œβ”€β”€ setup_database.py # Load data into SQLite
78
+ β”œβ”€β”€ explore_data.py # Data exploration and SQL queries
79
+ β”œβ”€β”€ train_models.py # Train ML models
80
+ β”œβ”€β”€ ab_test_simulation.py # Run A/B test simulation
81
+ β”œβ”€β”€ app.py # Streamlit dashboard
82
+ β”œβ”€β”€ requirements.txt # Python dependencies
83
+ β”œβ”€β”€ README.md # This file
84
+ └── .gitignore
85
+ ```
86
+
87
+ ## πŸš€ Quick Start
88
+
89
+ ### Local Development
90
+
91
+ 1. **Clone the repository**
92
+ ```bash
93
+ git clone https://github.com/YOUR_USERNAME/resource-optimization-ml.git
94
+ cd resource-optimization-ml
95
+ ```
96
+
97
+ 2. **Install dependencies** (using uv or pip)
98
+ ```bash
99
+ uv pip install -r requirements.txt
100
+ ```
101
+
102
+ 3. **Generate data**
103
+ ```bash
104
+ uv run python data_generation.py
105
+ ```
106
+
107
+ 4. **Setup database**
108
+ ```bash
109
+ uv run python setup_database.py
110
+ ```
111
+
112
+ 5. **Explore data**
113
+ ```bash
114
+ uv run python explore_data.py
115
+ ```
116
+
117
+ 6. **Train models**
118
+ ```bash
119
+ uv run python train_models.py
120
+ ```
121
+
122
+ 7. **Run A/B test simulation**
123
+ ```bash
124
+ uv run python ab_test_simulation.py
125
+ ```
126
+
127
+ 8. **Launch dashboard**
128
+ ```bash
129
+ uv run streamlit run app.py
130
+ ```
131
+
132
+ The dashboard will open at `http://localhost:8501`
133
+
134
+ ## πŸ“Š Dashboard Features
135
+
136
+ ### πŸ“ˆ Overview
137
+ - Service distribution by memory, CPU, and latency sensitivity
138
+ - Traffic volume analysis across regions
139
+ - Total statistics (150 services, 5 regions, 1.6M records)
140
+
141
+ ### 🎯 A/B Test Results
142
+ - Side-by-side comparison of control vs treatment strategies
143
+ - Latency reduction: 5.25%
144
+ - Cost savings: 4.92%
145
+ - Statistical significance test results (p-value, t-statistic)
146
+
147
+ ### πŸ—ΊοΈ Regional Analysis
148
+ - Interactive latency heatmap between all region pairs
149
+ - Regional statistics (min, max, std deviation)
150
+ - Identify high-latency corridors
151
+
152
+ ### πŸ”§ Service Details
153
+ - Interactive service explorer
154
+ - Per-service placement across regions
155
+ - Instance count and latency metrics
156
+
157
+ ## 🧠 Technical Stack
158
+
159
+ | Component | Tool | Purpose |
160
+ |-----------|------|---------|
161
+ | Data Storage | SQLite | Lightweight database for local development |
162
+ | Data Processing | Pandas, NumPy | Data manipulation and feature engineering |
163
+ | ML Framework | scikit-learn, XGBoost | Model training and prediction |
164
+ | Statistics | SciPy | A/B testing and significance tests |
165
+ | Visualization | Plotly, Streamlit | Interactive dashboards |
166
+ | Deployment | Hugging Face Spaces | Live dashboard hosting |
167
+
168
+ ## πŸ“ˆ Model Performance
169
+
170
+ ### XGBoost (Latency Prediction)
171
+ ```
172
+ RMSE: 28.7007 ms
173
+ MAE: 24.6690 ms
174
+ RΒ²: -0.0674 (indicates high variance in data)
175
+ ```
176
+
177
+ **Top 5 Important Features:**
178
+ 1. Request Variability (CV): 21.7%
179
+ 2. Outbound Latency: 17.6%
180
+ 3. Average Requests: 14.2%
181
+ 4. Dependencies: 13.5%
182
+ 5. Number of Instances: 11.7%
183
+
184
+ ### Random Forest (Placement Strategy)
185
+ ```
186
+ Accuracy: 100%
187
+ Precision: 1.00
188
+ Recall: 1.00
189
+ F1-Score: 1.00
190
+ ```
191
+
192
+ **Top Features:**
193
+ 1. Traffic Volume: 54.5%
194
+ 2. Dependencies: 13.8%
195
+ 3. Latency Sensitivity: 13.7%
196
+
197
+ ## πŸ§ͺ A/B Test Methodology
198
+
199
+ **Hypothesis:** ML-optimized placement reduces latency compared to random placement
200
+
201
+ **Sample Size:** 150 services Γ— 5 regions = 750 potential placements
202
+
203
+ **Metrics:**
204
+ - Primary: Average latency (ms)
205
+ - Secondary: Total cost ($), redundancy score, critical service latency
206
+ - Efficiency: Number of placement pairs (fewer = more efficient)
207
+
208
+ **Test Type:** Independent samples t-test
209
+ - Null hypothesis (Hβ‚€): ΞΌ_control = ΞΌ_treatment
210
+ - Alternative hypothesis (H₁): ΞΌ_control β‰  ΞΌ_treatment
211
+ - Significance level: Ξ± = 0.05
212
+
213
+ **Result:** Reject Hβ‚€ (p < 0.001)
214
+ - The ML-optimized placement significantly reduces latency
215
+
216
+ ## πŸ’‘ Key Insights
217
+
218
+ 1. **Latency-critical services benefit most** from optimized placement (9.3% improvement vs 5.25% average)
219
+ 2. **Traffic patterns drive decisions** - high-traffic services benefit from multi-region placement
220
+ 3. **Regional cost differences matter** - avoiding expensive regions saves 4.92% without sacrificing latency
221
+ 4. **Placement efficiency improves** - ML uses 16% fewer placement pairs while reducing latency
222
+ 5. **Statistical rigor matters** - The improvement is not due to chance (p < 0.001)
223
+
224
+ ## πŸš€ Future Enhancements
225
+
226
+ ### Short-term
227
+ - [ ] Add notebook with exploratory data analysis
228
+ - [ ] Include feature importance visualizations
229
+ - [ ] Create prediction API endpoint
230
+
231
+ ### Medium-term
232
+ - [ ] Integrate real AWS CloudWatch metrics
233
+ - [ ] Add model retraining pipeline
234
+ - [ ] Implement automated alerting
235
+ - [ ] Support multi-cloud scenarios (GCP, Azure)
236
+
237
+ ### Long-term
238
+ - [ ] Deploy as microservice recommendation engine
239
+ - [ ] Build feedback loop for model improvement
240
+ - [ ] Create cost optimization module
241
+ - [ ] Add capacity planning features
242
+
243
+ ## πŸ“š Learning Resources
244
+
245
+ This project demonstrates:
246
+ - βœ… SQL data querying and aggregation
247
+ - βœ… Python data manipulation (Pandas, NumPy)
248
+ - βœ… Machine learning model training (scikit-learn, XGBoost)
249
+ - βœ… Feature engineering and preprocessing
250
+ - βœ… Statistical hypothesis testing
251
+ - βœ… A/B testing methodology
252
+ - βœ… Data visualization (Plotly, Streamlit)
253
+ - βœ… Full-stack ML deployment
254
+
255
+ ## πŸ“ License
256
+
257
+ This project is open source and available under the MIT License.
258
+
259
+ ## πŸ‘€ Author
260
+
261
+ Built as a portfolio project demonstrating ML engineering capabilities for cloud infrastructure optimization.
262
+
263
+ ---
264
+
265
+ **Questions or feedback?** Open an issue or reach out!
266
+
267
+ **Live Dashboard:** [Hugging Face Spaces](https://huggingface.co/spaces/aankitdas/resource-optimization-ml)
268
+ **GitHub:** [resource-optimization-ml](https://github.com/aankitdas/resource-optimization-ml)