Spaces:
Sleeping
Sleeping
| # Policy Analysis Application - Model Pre-loading Setup | |
| This application has been enhanced with model pre-loading capabilities to significantly reduce inference time during deployment. | |
| ## π Quick Start | |
| ### Option 1: Docker Deployment (Recommended) | |
| ```bash | |
| # Clone the repository | |
| git clone <your-repo-url> | |
| cd policy-analysis | |
| # Build and run with Docker | |
| docker-compose up --build | |
| ``` | |
| ### Option 2: Manual Setup | |
| ```bash | |
| # Install dependencies | |
| pip install -r requirements.txt | |
| # Download all models (one-time setup) | |
| python download_models.py | |
| # Test models are working | |
| python test_models.py | |
| # Start the application | |
| python app.py | |
| ``` | |
| ## π¦ What's New | |
| ### Files Added: | |
| - **`download_models.py`** - Downloads all required ML models | |
| - **`test_models.py`** - Verifies all models are working correctly | |
| - **`startup.py`** - Startup script with automatic model downloading | |
| - **`Dockerfile`** - Docker configuration with model pre-caching | |
| - **`docker-compose.yml`** - Docker Compose setup | |
| - **`MODEL_SETUP.md`** - Detailed setup documentation | |
| ### Files Modified: | |
| - **`app.py`** - Added model pre-loading functionality | |
| - **`requirements.txt`** - Added missing dependencies (numpy, requests) | |
| - **`utils/coherence_bbscore.py`** - Fixed default embedder parameter | |
| ## π€ Models Used | |
| The application uses these ML models: | |
| | Model | Type | Size | Purpose | | |
| |-------|------|------|---------| | |
| | `sentence-transformers/all-MiniLM-L6-v2` | Embedding | ~90MB | Text encoding | | |
| | `BAAI/bge-m3` | Embedding | ~2.3GB | Advanced text encoding | | |
| | `cross-encoder/ms-marco-MiniLM-L-6-v2` | Cross-Encoder | ~130MB | Document reranking | | |
| | `MoritzLaurer/deberta-v3-base-zeroshot-v2.0` | Classification | ~1.5GB | Sentiment analysis | | |
| **Total download size**: ~4GB | |
| ## β‘ Performance Benefits | |
| ### Before (without pre-loading): | |
| - First request: 30-60 seconds (model download + inference) | |
| - Subsequent requests: 2-5 seconds | |
| ### After (with pre-loading): | |
| - First request: 2-5 seconds | |
| - Subsequent requests: 2-5 seconds | |
| ## π§ Configuration | |
| ### Environment Variables: | |
| - `PRELOAD_MODELS=true` (default) - Pre-load models on app startup | |
| - `PRELOAD_MODELS=false` - Skip pre-loading (useful when models are cached) | |
| ### Model Cache Location: | |
| - **Linux/Mac**: `~/.cache/huggingface/` | |
| - **Windows**: `%USERPROFILE%\.cache\huggingface\` | |
| ## π³ Docker Deployment | |
| The Dockerfile automatically downloads models during the build process: | |
| ```dockerfile | |
| # Downloads models and caches them in the image | |
| RUN python download_models.py | |
| ``` | |
| This means: | |
| - β No download time during container startup | |
| - β Consistent performance across deployments | |
| - β Offline inference capability | |
| ## π§ͺ Testing | |
| Verify everything is working: | |
| ```bash | |
| # Test all models | |
| python test_models.py | |
| # Expected output: | |
| # π§ͺ Model Verification Test Suite | |
| # β All tests passed! The application is ready to deploy. | |
| ``` | |
| ## π Resource Requirements | |
| ### Minimum: | |
| - **RAM**: 8GB | |
| - **Storage**: 6GB (models + dependencies) | |
| - **CPU**: 2+ cores | |
| ### Recommended: | |
| - **RAM**: 16GB | |
| - **Storage**: 10GB | |
| - **CPU**: 4+ cores | |
| - **GPU**: Optional (NVIDIA with CUDA support) | |
| ## π¨ Troubleshooting | |
| ### Model Download Issues: | |
| ```bash | |
| # Check connectivity | |
| curl -I https://huggingface.co | |
| # Check disk space | |
| df -h | |
| # Manual model test | |
| python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')" | |
| ``` | |
| ### Memory Issues: | |
| - Reduce model batch sizes | |
| - Use CPU-only inference: `device=-1` | |
| - Consider model quantization | |
| ### Slow Performance: | |
| - Verify models are cached locally | |
| - Check if `PRELOAD_MODELS=true` | |
| - Monitor CPU/GPU usage | |
| ## π Monitoring | |
| Monitor these metrics in production: | |
| - Model loading time | |
| - Inference latency | |
| - Memory usage | |
| - Cache hit ratio | |
| ## π Updates | |
| To update models: | |
| ```bash | |
| # Clear cache | |
| rm -rf ~/.cache/huggingface/ | |
| # Re-download | |
| python download_models.py | |
| # Test | |
| python test_models.py | |
| ``` | |
| ## π‘ Tips for Production | |
| 1. **Use Docker**: Models are cached in the image | |
| 2. **Persistent Volumes**: Mount model cache for faster rebuilds | |
| 3. **Health Checks**: Monitor model availability | |
| 4. **Resource Limits**: Set appropriate memory/CPU limits | |
| 5. **Load Balancing**: Use multiple instances for high traffic | |
| ## π€ Contributing | |
| When adding new models: | |
| 1. Add model name to `download_models.py` | |
| 2. Add test case to `test_models.py` | |
| 3. Update documentation | |
| 4. Test thoroughly | |
| --- | |
| For detailed setup instructions, see [`MODEL_SETUP.md`](MODEL_SETUP.md). | |