# Policy Analysis Application - Model Pre-loading Setup This application has been enhanced with model pre-loading capabilities to significantly reduce inference time during deployment. ## ๐Ÿš€ Quick Start ### Option 1: Docker Deployment (Recommended) ```bash # Clone the repository git clone cd policy-analysis # Build and run with Docker docker-compose up --build ``` ### Option 2: Manual Setup ```bash # Install dependencies pip install -r requirements.txt # Download all models (one-time setup) python download_models.py # Test models are working python test_models.py # Start the application python app.py ``` ## ๐Ÿ“ฆ What's New ### Files Added: - **`download_models.py`** - Downloads all required ML models - **`test_models.py`** - Verifies all models are working correctly - **`startup.py`** - Startup script with automatic model downloading - **`Dockerfile`** - Docker configuration with model pre-caching - **`docker-compose.yml`** - Docker Compose setup - **`MODEL_SETUP.md`** - Detailed setup documentation ### Files Modified: - **`app.py`** - Added model pre-loading functionality - **`requirements.txt`** - Added missing dependencies (numpy, requests) - **`utils/coherence_bbscore.py`** - Fixed default embedder parameter ## ๐Ÿค– Models Used The application uses these ML models: | Model | Type | Size | Purpose | |-------|------|------|---------| | `sentence-transformers/all-MiniLM-L6-v2` | Embedding | ~90MB | Text encoding | | `BAAI/bge-m3` | Embedding | ~2.3GB | Advanced text encoding | | `cross-encoder/ms-marco-MiniLM-L-6-v2` | Cross-Encoder | ~130MB | Document reranking | | `MoritzLaurer/deberta-v3-base-zeroshot-v2.0` | Classification | ~1.5GB | Sentiment analysis | **Total download size**: ~4GB ## โšก Performance Benefits ### Before (without pre-loading): - First request: 30-60 seconds (model download + inference) - Subsequent requests: 2-5 seconds ### After (with pre-loading): - First request: 2-5 seconds - Subsequent requests: 2-5 seconds ## ๐Ÿ”ง Configuration ### Environment Variables: - `PRELOAD_MODELS=true` (default) - Pre-load models on app startup - `PRELOAD_MODELS=false` - Skip pre-loading (useful when models are cached) ### Model Cache Location: - **Linux/Mac**: `~/.cache/huggingface/` - **Windows**: `%USERPROFILE%\.cache\huggingface\` ## ๐Ÿณ Docker Deployment The Dockerfile automatically downloads models during the build process: ```dockerfile # Downloads models and caches them in the image RUN python download_models.py ``` This means: - โœ… No download time during container startup - โœ… Consistent performance across deployments - โœ… Offline inference capability ## ๐Ÿงช Testing Verify everything is working: ```bash # Test all models python test_models.py # Expected output: # ๐Ÿงช Model Verification Test Suite # โœ… All tests passed! The application is ready to deploy. ``` ## ๐Ÿ“Š Resource Requirements ### Minimum: - **RAM**: 8GB - **Storage**: 6GB (models + dependencies) - **CPU**: 2+ cores ### Recommended: - **RAM**: 16GB - **Storage**: 10GB - **CPU**: 4+ cores - **GPU**: Optional (NVIDIA with CUDA support) ## ๐Ÿšจ Troubleshooting ### Model Download Issues: ```bash # Check connectivity curl -I https://huggingface.co # Check disk space df -h # Manual model test python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')" ``` ### Memory Issues: - Reduce model batch sizes - Use CPU-only inference: `device=-1` - Consider model quantization ### Slow Performance: - Verify models are cached locally - Check if `PRELOAD_MODELS=true` - Monitor CPU/GPU usage ## ๐Ÿ“ˆ Monitoring Monitor these metrics in production: - Model loading time - Inference latency - Memory usage - Cache hit ratio ## ๐Ÿ”„ Updates To update models: ```bash # Clear cache rm -rf ~/.cache/huggingface/ # Re-download python download_models.py # Test python test_models.py ``` ## ๐Ÿ’ก Tips for Production 1. **Use Docker**: Models are cached in the image 2. **Persistent Volumes**: Mount model cache for faster rebuilds 3. **Health Checks**: Monitor model availability 4. **Resource Limits**: Set appropriate memory/CPU limits 5. **Load Balancing**: Use multiple instances for high traffic ## ๐Ÿค Contributing When adding new models: 1. Add model name to `download_models.py` 2. Add test case to `test_models.py` 3. Update documentation 4. Test thoroughly --- For detailed setup instructions, see [`MODEL_SETUP.md`](MODEL_SETUP.md).