Spaces:

kaburia
/

policy-analysis

Sleeping

App Files Files Community

policy-analysis / README_MODELS.md

kaburia

updated app

5d99375 6 months ago

preview code

raw

history blame contribute delete

4.51 kB

	# Policy Analysis Application - Model Pre-loading Setup

	This application has been enhanced with model pre-loading capabilities to significantly reduce inference time during deployment.

	## 🚀 Quick Start

	### Option 1: Docker Deployment (Recommended)
	```bash
	# Clone the repository
	git clone <your-repo-url>
	cd policy-analysis

	# Build and run with Docker
	docker-compose up --build
	```

	### Option 2: Manual Setup
	```bash
	# Install dependencies
	pip install -r requirements.txt

	# Download all models (one-time setup)
	python download_models.py

	# Test models are working
	python test_models.py

	# Start the application
	python app.py
	```

	## 📦 What's New

	### Files Added:
	- `download_models.py` - Downloads all required ML models
	- `test_models.py` - Verifies all models are working correctly
	- `startup.py` - Startup script with automatic model downloading
	- `Dockerfile` - Docker configuration with model pre-caching
	- `docker-compose.yml` - Docker Compose setup
	- `MODEL_SETUP.md` - Detailed setup documentation

	### Files Modified:
	- `app.py` - Added model pre-loading functionality
	- `requirements.txt` - Added missing dependencies (numpy, requests)
	- `utils/coherence_bbscore.py` - Fixed default embedder parameter

	## 🤖 Models Used

	The application uses these ML models:

	\| Model \| Type \| Size \| Purpose \|
	\|-------\|------\|------\|---------\|
	\| `sentence-transformers/all-MiniLM-L6-v2` \| Embedding \| ~90MB \| Text encoding \|
	\| `BAAI/bge-m3` \| Embedding \| ~2.3GB \| Advanced text encoding \|
	\| `cross-encoder/ms-marco-MiniLM-L-6-v2` \| Cross-Encoder \| ~130MB \| Document reranking \|
	\| `MoritzLaurer/deberta-v3-base-zeroshot-v2.0` \| Classification \| ~1.5GB \| Sentiment analysis \|

	Total download size: ~4GB

	## ⚡ Performance Benefits

	### Before (without pre-loading):
	- First request: 30-60 seconds (model download + inference)
	- Subsequent requests: 2-5 seconds

	### After (with pre-loading):
	- First request: 2-5 seconds
	- Subsequent requests: 2-5 seconds

	## 🔧 Configuration

	### Environment Variables:
	- `PRELOAD_MODELS=true` (default) - Pre-load models on app startup
	- `PRELOAD_MODELS=false` - Skip pre-loading (useful when models are cached)

	### Model Cache Location:
	- Linux/Mac: `~/.cache/huggingface/`
	- Windows: `%USERPROFILE%\.cache\huggingface\`

	## 🐳 Docker Deployment

	The Dockerfile automatically downloads models during the build process:

	```dockerfile
	# Downloads models and caches them in the image
	RUN python download_models.py
	```

	This means:
	- ✅ No download time during container startup
	- ✅ Consistent performance across deployments
	- ✅ Offline inference capability

	## 🧪 Testing

	Verify everything is working:

	```bash
	# Test all models
	python test_models.py

	# Expected output:
	# 🧪 Model Verification Test Suite
	# ✅ All tests passed! The application is ready to deploy.
	```

	## 📊 Resource Requirements

	### Minimum:
	- RAM: 8GB
	- Storage: 6GB (models + dependencies)
	- CPU: 2+ cores

	### Recommended:
	- RAM: 16GB
	- Storage: 10GB
	- CPU: 4+ cores
	- GPU: Optional (NVIDIA with CUDA support)

	## 🚨 Troubleshooting

	### Model Download Issues:
	```bash
	# Check connectivity
	curl -I https://huggingface.co

	# Check disk space
	df -h

	# Manual model test
	python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')"
	```

	### Memory Issues:
	- Reduce model batch sizes
	- Use CPU-only inference: `device=-1`
	- Consider model quantization

	### Slow Performance:
	- Verify models are cached locally
	- Check if `PRELOAD_MODELS=true`
	- Monitor CPU/GPU usage

	## 📈 Monitoring

	Monitor these metrics in production:
	- Model loading time
	- Inference latency
	- Memory usage
	- Cache hit ratio

	## 🔄 Updates

	To update models:
	```bash
	# Clear cache
	rm -rf ~/.cache/huggingface/

	# Re-download
	python download_models.py

	# Test
	python test_models.py
	```

	## 💡 Tips for Production

	1. Use Docker: Models are cached in the image
	2. Persistent Volumes: Mount model cache for faster rebuilds
	3. Health Checks: Monitor model availability
	4. Resource Limits: Set appropriate memory/CPU limits
	5. Load Balancing: Use multiple instances for high traffic

	## 🤝 Contributing

	When adding new models:
	1. Add model name to `download_models.py`
	2. Add test case to `test_models.py`
	3. Update documentation
	4. Test thoroughly

	---

	For detailed setup instructions, see [`MODEL_SETUP.md`](MODEL_SETUP.md).