Spaces:
Paused
Paused
File size: 4,505 Bytes
5d99375 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 | # Policy Analysis Application - Model Pre-loading Setup
This application has been enhanced with model pre-loading capabilities to significantly reduce inference time during deployment.
## π Quick Start
### Option 1: Docker Deployment (Recommended)
```bash
# Clone the repository
git clone <your-repo-url>
cd policy-analysis
# Build and run with Docker
docker-compose up --build
```
### Option 2: Manual Setup
```bash
# Install dependencies
pip install -r requirements.txt
# Download all models (one-time setup)
python download_models.py
# Test models are working
python test_models.py
# Start the application
python app.py
```
## π¦ What's New
### Files Added:
- **`download_models.py`** - Downloads all required ML models
- **`test_models.py`** - Verifies all models are working correctly
- **`startup.py`** - Startup script with automatic model downloading
- **`Dockerfile`** - Docker configuration with model pre-caching
- **`docker-compose.yml`** - Docker Compose setup
- **`MODEL_SETUP.md`** - Detailed setup documentation
### Files Modified:
- **`app.py`** - Added model pre-loading functionality
- **`requirements.txt`** - Added missing dependencies (numpy, requests)
- **`utils/coherence_bbscore.py`** - Fixed default embedder parameter
## π€ Models Used
The application uses these ML models:
| Model | Type | Size | Purpose |
|-------|------|------|---------|
| `sentence-transformers/all-MiniLM-L6-v2` | Embedding | ~90MB | Text encoding |
| `BAAI/bge-m3` | Embedding | ~2.3GB | Advanced text encoding |
| `cross-encoder/ms-marco-MiniLM-L-6-v2` | Cross-Encoder | ~130MB | Document reranking |
| `MoritzLaurer/deberta-v3-base-zeroshot-v2.0` | Classification | ~1.5GB | Sentiment analysis |
**Total download size**: ~4GB
## β‘ Performance Benefits
### Before (without pre-loading):
- First request: 30-60 seconds (model download + inference)
- Subsequent requests: 2-5 seconds
### After (with pre-loading):
- First request: 2-5 seconds
- Subsequent requests: 2-5 seconds
## π§ Configuration
### Environment Variables:
- `PRELOAD_MODELS=true` (default) - Pre-load models on app startup
- `PRELOAD_MODELS=false` - Skip pre-loading (useful when models are cached)
### Model Cache Location:
- **Linux/Mac**: `~/.cache/huggingface/`
- **Windows**: `%USERPROFILE%\.cache\huggingface\`
## π³ Docker Deployment
The Dockerfile automatically downloads models during the build process:
```dockerfile
# Downloads models and caches them in the image
RUN python download_models.py
```
This means:
- β
No download time during container startup
- β
Consistent performance across deployments
- β
Offline inference capability
## π§ͺ Testing
Verify everything is working:
```bash
# Test all models
python test_models.py
# Expected output:
# π§ͺ Model Verification Test Suite
# β
All tests passed! The application is ready to deploy.
```
## π Resource Requirements
### Minimum:
- **RAM**: 8GB
- **Storage**: 6GB (models + dependencies)
- **CPU**: 2+ cores
### Recommended:
- **RAM**: 16GB
- **Storage**: 10GB
- **CPU**: 4+ cores
- **GPU**: Optional (NVIDIA with CUDA support)
## π¨ Troubleshooting
### Model Download Issues:
```bash
# Check connectivity
curl -I https://huggingface.co
# Check disk space
df -h
# Manual model test
python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')"
```
### Memory Issues:
- Reduce model batch sizes
- Use CPU-only inference: `device=-1`
- Consider model quantization
### Slow Performance:
- Verify models are cached locally
- Check if `PRELOAD_MODELS=true`
- Monitor CPU/GPU usage
## π Monitoring
Monitor these metrics in production:
- Model loading time
- Inference latency
- Memory usage
- Cache hit ratio
## π Updates
To update models:
```bash
# Clear cache
rm -rf ~/.cache/huggingface/
# Re-download
python download_models.py
# Test
python test_models.py
```
## π‘ Tips for Production
1. **Use Docker**: Models are cached in the image
2. **Persistent Volumes**: Mount model cache for faster rebuilds
3. **Health Checks**: Monitor model availability
4. **Resource Limits**: Set appropriate memory/CPU limits
5. **Load Balancing**: Use multiple instances for high traffic
## π€ Contributing
When adding new models:
1. Add model name to `download_models.py`
2. Add test case to `test_models.py`
3. Update documentation
4. Test thoroughly
---
For detailed setup instructions, see [`MODEL_SETUP.md`](MODEL_SETUP.md).
|