File size: 4,505 Bytes
5d99375
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
# Policy Analysis Application - Model Pre-loading Setup

This application has been enhanced with model pre-loading capabilities to significantly reduce inference time during deployment.

## πŸš€ Quick Start

### Option 1: Docker Deployment (Recommended)
```bash
# Clone the repository
git clone <your-repo-url>
cd policy-analysis

# Build and run with Docker
docker-compose up --build
```

### Option 2: Manual Setup
```bash
# Install dependencies
pip install -r requirements.txt

# Download all models (one-time setup)
python download_models.py

# Test models are working
python test_models.py

# Start the application
python app.py
```

## πŸ“¦ What's New

### Files Added:
- **`download_models.py`** - Downloads all required ML models
- **`test_models.py`** - Verifies all models are working correctly  
- **`startup.py`** - Startup script with automatic model downloading
- **`Dockerfile`** - Docker configuration with model pre-caching
- **`docker-compose.yml`** - Docker Compose setup
- **`MODEL_SETUP.md`** - Detailed setup documentation

### Files Modified:
- **`app.py`** - Added model pre-loading functionality
- **`requirements.txt`** - Added missing dependencies (numpy, requests)
- **`utils/coherence_bbscore.py`** - Fixed default embedder parameter

## πŸ€– Models Used

The application uses these ML models:

| Model | Type | Size | Purpose |
|-------|------|------|---------|
| `sentence-transformers/all-MiniLM-L6-v2` | Embedding | ~90MB | Text encoding |
| `BAAI/bge-m3` | Embedding | ~2.3GB | Advanced text encoding |
| `cross-encoder/ms-marco-MiniLM-L-6-v2` | Cross-Encoder | ~130MB | Document reranking |
| `MoritzLaurer/deberta-v3-base-zeroshot-v2.0` | Classification | ~1.5GB | Sentiment analysis |

**Total download size**: ~4GB

## ⚑ Performance Benefits

### Before (without pre-loading):
- First request: 30-60 seconds (model download + inference)
- Subsequent requests: 2-5 seconds

### After (with pre-loading):
- First request: 2-5 seconds
- Subsequent requests: 2-5 seconds

## πŸ”§ Configuration

### Environment Variables:
- `PRELOAD_MODELS=true` (default) - Pre-load models on app startup
- `PRELOAD_MODELS=false` - Skip pre-loading (useful when models are cached)

### Model Cache Location:
- **Linux/Mac**: `~/.cache/huggingface/`
- **Windows**: `%USERPROFILE%\.cache\huggingface\`

## 🐳 Docker Deployment

The Dockerfile automatically downloads models during the build process:

```dockerfile
# Downloads models and caches them in the image
RUN python download_models.py
```

This means:
- βœ… No download time during container startup
- βœ… Consistent performance across deployments
- βœ… Offline inference capability

## πŸ§ͺ Testing

Verify everything is working:

```bash
# Test all models
python test_models.py

# Expected output:
# πŸ§ͺ Model Verification Test Suite
# βœ… All tests passed! The application is ready to deploy.
```

## πŸ“Š Resource Requirements

### Minimum:
- **RAM**: 8GB
- **Storage**: 6GB (models + dependencies)
- **CPU**: 2+ cores

### Recommended:
- **RAM**: 16GB
- **Storage**: 10GB
- **CPU**: 4+ cores
- **GPU**: Optional (NVIDIA with CUDA support)

## 🚨 Troubleshooting

### Model Download Issues:
```bash
# Check connectivity
curl -I https://huggingface.co

# Check disk space
df -h

# Manual model test
python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')"
```

### Memory Issues:
- Reduce model batch sizes
- Use CPU-only inference: `device=-1`
- Consider model quantization

### Slow Performance:
- Verify models are cached locally
- Check if `PRELOAD_MODELS=true`
- Monitor CPU/GPU usage

## πŸ“ˆ Monitoring

Monitor these metrics in production:
- Model loading time
- Inference latency  
- Memory usage
- Cache hit ratio

## πŸ”„ Updates

To update models:
```bash
# Clear cache
rm -rf ~/.cache/huggingface/

# Re-download
python download_models.py

# Test
python test_models.py
```

## πŸ’‘ Tips for Production

1. **Use Docker**: Models are cached in the image
2. **Persistent Volumes**: Mount model cache for faster rebuilds
3. **Health Checks**: Monitor model availability
4. **Resource Limits**: Set appropriate memory/CPU limits
5. **Load Balancing**: Use multiple instances for high traffic

## 🀝 Contributing

When adding new models:
1. Add model name to `download_models.py`
2. Add test case to `test_models.py`
3. Update documentation
4. Test thoroughly

---

For detailed setup instructions, see [`MODEL_SETUP.md`](MODEL_SETUP.md).