policy-analysis / README_MODELS.md
kaburia's picture
updated app
5d99375
# Policy Analysis Application - Model Pre-loading Setup
This application has been enhanced with model pre-loading capabilities to significantly reduce inference time during deployment.
## πŸš€ Quick Start
### Option 1: Docker Deployment (Recommended)
```bash
# Clone the repository
git clone <your-repo-url>
cd policy-analysis
# Build and run with Docker
docker-compose up --build
```
### Option 2: Manual Setup
```bash
# Install dependencies
pip install -r requirements.txt
# Download all models (one-time setup)
python download_models.py
# Test models are working
python test_models.py
# Start the application
python app.py
```
## πŸ“¦ What's New
### Files Added:
- **`download_models.py`** - Downloads all required ML models
- **`test_models.py`** - Verifies all models are working correctly
- **`startup.py`** - Startup script with automatic model downloading
- **`Dockerfile`** - Docker configuration with model pre-caching
- **`docker-compose.yml`** - Docker Compose setup
- **`MODEL_SETUP.md`** - Detailed setup documentation
### Files Modified:
- **`app.py`** - Added model pre-loading functionality
- **`requirements.txt`** - Added missing dependencies (numpy, requests)
- **`utils/coherence_bbscore.py`** - Fixed default embedder parameter
## πŸ€– Models Used
The application uses these ML models:
| Model | Type | Size | Purpose |
|-------|------|------|---------|
| `sentence-transformers/all-MiniLM-L6-v2` | Embedding | ~90MB | Text encoding |
| `BAAI/bge-m3` | Embedding | ~2.3GB | Advanced text encoding |
| `cross-encoder/ms-marco-MiniLM-L-6-v2` | Cross-Encoder | ~130MB | Document reranking |
| `MoritzLaurer/deberta-v3-base-zeroshot-v2.0` | Classification | ~1.5GB | Sentiment analysis |
**Total download size**: ~4GB
## ⚑ Performance Benefits
### Before (without pre-loading):
- First request: 30-60 seconds (model download + inference)
- Subsequent requests: 2-5 seconds
### After (with pre-loading):
- First request: 2-5 seconds
- Subsequent requests: 2-5 seconds
## πŸ”§ Configuration
### Environment Variables:
- `PRELOAD_MODELS=true` (default) - Pre-load models on app startup
- `PRELOAD_MODELS=false` - Skip pre-loading (useful when models are cached)
### Model Cache Location:
- **Linux/Mac**: `~/.cache/huggingface/`
- **Windows**: `%USERPROFILE%\.cache\huggingface\`
## 🐳 Docker Deployment
The Dockerfile automatically downloads models during the build process:
```dockerfile
# Downloads models and caches them in the image
RUN python download_models.py
```
This means:
- βœ… No download time during container startup
- βœ… Consistent performance across deployments
- βœ… Offline inference capability
## πŸ§ͺ Testing
Verify everything is working:
```bash
# Test all models
python test_models.py
# Expected output:
# πŸ§ͺ Model Verification Test Suite
# βœ… All tests passed! The application is ready to deploy.
```
## πŸ“Š Resource Requirements
### Minimum:
- **RAM**: 8GB
- **Storage**: 6GB (models + dependencies)
- **CPU**: 2+ cores
### Recommended:
- **RAM**: 16GB
- **Storage**: 10GB
- **CPU**: 4+ cores
- **GPU**: Optional (NVIDIA with CUDA support)
## 🚨 Troubleshooting
### Model Download Issues:
```bash
# Check connectivity
curl -I https://huggingface.co
# Check disk space
df -h
# Manual model test
python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')"
```
### Memory Issues:
- Reduce model batch sizes
- Use CPU-only inference: `device=-1`
- Consider model quantization
### Slow Performance:
- Verify models are cached locally
- Check if `PRELOAD_MODELS=true`
- Monitor CPU/GPU usage
## πŸ“ˆ Monitoring
Monitor these metrics in production:
- Model loading time
- Inference latency
- Memory usage
- Cache hit ratio
## πŸ”„ Updates
To update models:
```bash
# Clear cache
rm -rf ~/.cache/huggingface/
# Re-download
python download_models.py
# Test
python test_models.py
```
## πŸ’‘ Tips for Production
1. **Use Docker**: Models are cached in the image
2. **Persistent Volumes**: Mount model cache for faster rebuilds
3. **Health Checks**: Monitor model availability
4. **Resource Limits**: Set appropriate memory/CPU limits
5. **Load Balancing**: Use multiple instances for high traffic
## 🀝 Contributing
When adding new models:
1. Add model name to `download_models.py`
2. Add test case to `test_models.py`
3. Update documentation
4. Test thoroughly
---
For detailed setup instructions, see [`MODEL_SETUP.md`](MODEL_SETUP.md).