Spaces:

kaburia
/

policy-analysis

Paused

File size: 4,505 Bytes

5d99375

# Policy Analysis Application - Model Pre-loading Setup

This application has been enhanced with model pre-loading capabilities to significantly reduce inference time during deployment.

## 🚀 Quick Start

### Option 1: Docker Deployment (Recommended)
```bash
# Clone the repository
git clone <your-repo-url>
cd policy-analysis

# Build and run with Docker
docker-compose up --build
```

### Option 2: Manual Setup
```bash
# Install dependencies
pip install -r requirements.txt

# Download all models (one-time setup)
python download_models.py

# Test models are working
python test_models.py

# Start the application
python app.py
```

## 📦 What's New

### Files Added:
- **`download_models.py`** - Downloads all required ML models
- **`test_models.py`** - Verifies all models are working correctly  
- **`startup.py`** - Startup script with automatic model downloading
- **`Dockerfile`** - Docker configuration with model pre-caching
- **`docker-compose.yml`** - Docker Compose setup
- **`MODEL_SETUP.md`** - Detailed setup documentation

### Files Modified:
- **`app.py`** - Added model pre-loading functionality
- **`requirements.txt`** - Added missing dependencies (numpy, requests)
- **`utils/coherence_bbscore.py`** - Fixed default embedder parameter

## 🤖 Models Used

The application uses these ML models:

| Model | Type | Size | Purpose |
|-------|------|------|---------|
| `sentence-transformers/all-MiniLM-L6-v2` | Embedding | ~90MB | Text encoding |
| `BAAI/bge-m3` | Embedding | ~2.3GB | Advanced text encoding |
| `cross-encoder/ms-marco-MiniLM-L-6-v2` | Cross-Encoder | ~130MB | Document reranking |
| `MoritzLaurer/deberta-v3-base-zeroshot-v2.0` | Classification | ~1.5GB | Sentiment analysis |

**Total download size**: ~4GB

## ⚡ Performance Benefits

### Before (without pre-loading):
- First request: 30-60 seconds (model download + inference)
- Subsequent requests: 2-5 seconds

### After (with pre-loading):
- First request: 2-5 seconds
- Subsequent requests: 2-5 seconds

## 🔧 Configuration

### Environment Variables:
- `PRELOAD_MODELS=true` (default) - Pre-load models on app startup
- `PRELOAD_MODELS=false` - Skip pre-loading (useful when models are cached)

### Model Cache Location:
- **Linux/Mac**: `~/.cache/huggingface/`
- **Windows**: `%USERPROFILE%\.cache\huggingface\`

## 🐳 Docker Deployment

The Dockerfile automatically downloads models during the build process:

```dockerfile
# Downloads models and caches them in the image
RUN python download_models.py
```

This means:
- ✅ No download time during container startup
- ✅ Consistent performance across deployments
- ✅ Offline inference capability

## 🧪 Testing

Verify everything is working:

```bash
# Test all models
python test_models.py

# Expected output:
# 🧪 Model Verification Test Suite
# ✅ All tests passed! The application is ready to deploy.
```

## 📊 Resource Requirements

### Minimum:
- **RAM**: 8GB
- **Storage**: 6GB (models + dependencies)
- **CPU**: 2+ cores

### Recommended:
- **RAM**: 16GB
- **Storage**: 10GB
- **CPU**: 4+ cores
- **GPU**: Optional (NVIDIA with CUDA support)

## 🚨 Troubleshooting

### Model Download Issues:
```bash
# Check connectivity
curl -I https://huggingface.co

# Check disk space
df -h

# Manual model test
python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')"
```

### Memory Issues:
- Reduce model batch sizes
- Use CPU-only inference: `device=-1`
- Consider model quantization

### Slow Performance:
- Verify models are cached locally
- Check if `PRELOAD_MODELS=true`
- Monitor CPU/GPU usage

## 📈 Monitoring

Monitor these metrics in production:
- Model loading time
- Inference latency  
- Memory usage
- Cache hit ratio

## 🔄 Updates

To update models:
```bash
# Clear cache
rm -rf ~/.cache/huggingface/

# Re-download
python download_models.py

# Test
python test_models.py
```

## 💡 Tips for Production

1. **Use Docker**: Models are cached in the image
2. **Persistent Volumes**: Mount model cache for faster rebuilds
3. **Health Checks**: Monitor model availability
4. **Resource Limits**: Set appropriate memory/CPU limits
5. **Load Balancing**: Use multiple instances for high traffic

## 🤝 Contributing

When adding new models:
1. Add model name to `download_models.py`
2. Add test case to `test_models.py`
3. Update documentation
4. Test thoroughly

---

For detailed setup instructions, see [`MODEL_SETUP.md`](MODEL_SETUP.md).