File size: 6,615 Bytes
08123aa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
# Running Model Optimization with Docker

This guide shows you how to run the model optimization scripts using Docker.

## Prerequisites

- Docker installed and running
- Docker Compose (usually comes with Docker Desktop)
- At least 8GB RAM available for Docker
- Data file: `content/cardio_train_extended.csv`

## Quick Start

### Option 1: Using Docker Compose (Recommended)

```bash
# Build and run optimization
docker-compose -f docker-compose.optimization.yml up --build

# Run in detached mode (background)
docker-compose -f docker-compose.optimization.yml up -d --build

# View logs
docker-compose -f docker-compose.optimization.yml logs -f

# Stop when done
docker-compose -f docker-compose.optimization.yml down
```

### Option 2: Using Docker Directly

```bash
# Build the image
docker build -f Dockerfile.optimization -t heart-optimization .

# Run optimization
docker run --rm \
  -v "$(pwd)/content:/app/content" \
  -v "$(pwd)/model_assets:/app/model_assets:ro" \
  --name heart-optimization \
  heart-optimization

# Run with resource limits
docker run --rm \
  -v "$(pwd)/content:/app/content" \
  -v "$(pwd)/model_assets:/app/model_assets:ro" \
  --cpus="4" \
  --memory="8g" \
  --name heart-optimization \
  heart-optimization
```

## Running Specific Scripts

### Run Model Optimization Only

```bash
docker-compose -f docker-compose.optimization.yml run --rm optimization python improve_models.py
```

### Run Feature Analysis Only

```bash
docker-compose -f docker-compose.optimization.yml run --rm optimization python feature_importance_analysis.py
```

### Run Comparison

```bash
docker-compose -f docker-compose.optimization.yml run --rm optimization python compare_models.py
```

## Customization

### Adjust Resource Limits

Edit `docker-compose.optimization.yml`:

```yaml
deploy:
  resources:
    limits:
      cpus: '8'      # Use more CPUs if available
      memory: 16G    # More RAM for faster processing
```

### Reduce Optimization Time

Edit `improve_models.py` before building:

```python
n_trials = 50  # Reduce from 100 to 50 for faster results
```

Or override at runtime:

```bash
docker run --rm \
  -v "$(pwd)/content:/app/content" \
  -v "$(pwd)/improve_models.py:/app/improve_models.py" \
  heart-optimization python -c "
import sys
sys.path.insert(0, '/app')
# Modify n_trials here or use environment variable
exec(open('/app/improve_models.py').read().replace('n_trials = 100', 'n_trials = 50'))
"
```

### Use Environment Variables

Create a `.env` file:

```env
N_TRIALS=50
STUDY_TIMEOUT=1800
```

Then use it:

```bash
docker-compose -f docker-compose.optimization.yml --env-file .env up
```

## Monitoring Progress

### View Real-time Logs

```bash
# Using docker-compose
docker-compose -f docker-compose.optimization.yml logs -f

# Using docker
docker logs -f heart-optimization
```

### Check Container Status

```bash
docker ps
docker stats heart-optimization
```

## Results Location

All results are saved to your host machine in:
- `content/models/` - Optimized models and metrics
- `content/reports/` - Feature importance visualizations

These persist after the container stops.

## Troubleshooting

### Out of Memory

**Error:** `Killed` or memory errors

**Solution:**
1. Reduce `n_trials` in `improve_models.py`
2. Reduce memory limit in docker-compose.yml
3. Close other applications

### Build Fails

**Error:** Package installation fails

**Solution:**
```bash
# Clean build
docker-compose -f docker-compose.optimization.yml build --no-cache
```

### Data Not Found

**Error:** `Data file not found`

**Solution:**
```bash
# Verify data file exists
ls -lh content/cardio_train_extended.csv

# Check volume mount
docker-compose -f docker-compose.optimization.yml config
```

### Slow Performance

**Solutions:**
1. Increase CPU allocation in docker-compose.yml
2. Use fewer trials: `n_trials = 30`
3. Run on a machine with more resources

## Advanced Usage

### Interactive Shell

```bash
# Get a shell in the container
docker-compose -f docker-compose.optimization.yml run --rm optimization bash

# Then run scripts manually
python improve_models.py
```

### Run Multiple Optimizations

```bash
# Run optimization with different trial counts
for trials in 30 50 100; do
  docker run --rm \
    -v "$(pwd)/content:/app/content" \
    -e N_TRIALS=$trials \
    heart-optimization \
    python -c "import sys; sys.path.insert(0, '/app'); exec(open('/app/improve_models.py').read().replace('n_trials = 100', f'n_trials = {trials}'))"
done
```

### Save Container State

```bash
# Commit container to image
docker commit heart-optimization heart-optimization:snapshot

# Use later
docker run --rm -v "$(pwd)/content:/app/content" heart-optimization:snapshot
```

## Performance Tips

1. **Use SSD storage** - Faster I/O for data loading
2. **Allocate more CPUs** - Parallel processing in Optuna
3. **Increase memory** - Better for large datasets
4. **Run overnight** - Let it run while you sleep
5. **Use GPU** (if available) - Requires NVIDIA Docker runtime

## GPU Support (Optional)

If you have an NVIDIA GPU:

```yaml
# Add to docker-compose.optimization.yml
runtime: nvidia
environment:
  - NVIDIA_VISIBLE_DEVICES=all
```

Then build with:
```bash
docker build -f Dockerfile.optimization -t heart-optimization .
```

## Example Workflow

```bash
# 1. Build image
docker-compose -f docker-compose.optimization.yml build

# 2. Run optimization (takes 1-2 hours)
docker-compose -f docker-compose.optimization.yml up

# 3. In another terminal, check progress
docker-compose -f docker-compose.optimization.yml logs -f

# 4. When done, run feature analysis
docker-compose -f docker-compose.optimization.yml run --rm optimization \
  python feature_importance_analysis.py

# 5. Compare results
docker-compose -f docker-compose.optimization.yml run --rm optimization \
  python compare_models.py

# 6. Clean up
docker-compose -f docker-compose.optimization.yml down
```

## Benefits of Using Docker**Isolation** - No conflicts with your system Python  
✅ **Reproducibility** - Same environment every time  
✅ **Resource Control** - Limit CPU/memory usage  
✅ **Easy Cleanup** - Remove container when done  
✅ **Portability** - Run on any machine with Docker  

## Next Steps

After optimization completes:
1. Check results in `content/models/model_metrics_optimized.csv`
2. Review feature importance in `content/reports/`
3. Compare with baseline using `compare_models.py`
4. Deploy optimized models to your Streamlit app

---

**Note:** The optimization process can take 1-2 hours. Make sure your laptop is plugged in and won't go to sleep!