Spaces:
Sleeping
Troubleshooting Guide
This guide helps diagnose and resolve common issues with MediGuard AI.
Table of Contents
- Startup Issues
- Service Connectivity
- Performance Issues
- API Errors
- Database Issues
- Memory and CPU Issues
- Logging and Monitoring
- Common Error Messages
Startup Issues
Application Won't Start
Symptoms:
- Application exits immediately
- Port already in use errors
- Module import errors
Solutions:
Check port availability:
# Check if port 8000 is in use netstat -tulpn | grep 8000 # Or on Windows netstat -ano | findstr 8000Verify Python environment:
# Activate virtual environment source venv/bin/activate # On Windows venv\Scripts\activate # Check dependencies pip listCheck environment variables:
# Verify required variables are set env | grep -E "(GROQ|REDIS|OPENSEARCH)"Common startup errors and fixes:
Error Cause Solution ModuleNotFoundErrorMissing dependencies pip install -r requirements.txtPermission deniedPort requires privileges Use port > 1024 or run with sudo Address already in useAnother process using port Kill process or use different port
Docker Container Issues
Symptoms:
- Container fails to start
- Health check failures
- Volume mount errors
Solutions:
Check container logs:
docker logs mediguard-api docker-compose logs apiVerify Docker resources:
# Check Docker resource usage docker stats # Check disk space docker system dfRebuild container:
docker-compose down docker-compose build --no-cache docker-compose up -d
Service Connectivity
OpenSearch Connection Issues
Symptoms:
- Search requests failing
- Connection timeout errors
- Authentication failures
Diagnosis:
# Check OpenSearch health
curl -X GET "localhost:9200/_cluster/health?pretty"
# Test from application
curl http://localhost:8000/health/service/opensearch
Solutions:
Verify OpenSearch is running:
docker-compose ps opensearch docker-compose restart opensearchCheck network connectivity:
# Test connection telnet localhost 9200 # Check firewall sudo ufw statusFix authentication:
# In docker-compose.yml environment: - DISABLE_SECURITY_PLUGIN=true # For development
Redis Connection Issues
Symptoms:
- Cache misses
- Session data loss
- Rate limiting not working
Diagnosis:
# Test Redis connection
redis-cli ping
# Check from application
curl http://localhost:8000/health/service/redis
Solutions:
Restart Redis:
docker-compose restart redisClear corrupted data:
redis-cli FLUSHALLCheck memory limits:
# In redis-cli INFO memory
Ollama/LLM Connection Issues
Symptoms:
- LLM requests timing out
- Model not found errors
- Slow responses
Diagnosis:
# Check Ollama status
curl http://localhost:11434/api/tags
# Test model
curl http://localhost:11434/api/generate -d '{
"model": "llama3.3",
"prompt": "Test"
}'
Solutions:
Pull required models:
docker-compose exec ollama ollama pull llama3.3Check GPU availability:
nvidia-smiAdjust timeouts:
# In settings OLLAMA_TIMEOUT = 120 # Increase timeout
Performance Issues
Slow API Responses
Symptoms:
- Requests taking > 5 seconds
- Timeouts in client applications
- High CPU usage
Diagnosis:
Check response times:
# Use curl with timing curl -w "@curl-format.txt" -o /dev/null -s http://localhost:8000/health # Monitor with metrics curl http://localhost:8000/metrics | grep http_request_durationProfile the application:
# Use py-spy pip install py-spy py-spy top --pid <pid>
Solutions:
Enable caching:
# Add caching to expensive operations from src.services.cache.advanced_cache import cached @cached(ttl=300) async def expensive_operation(): ...Optimize database queries:
# Use optimized queries from src.services.opensearch.client import make_opensearch_client client = make_opensearch_client() results = client.search_bm25_optimized(query, min_score=0.5)Scale horizontally:
# Run multiple instances docker-compose up -d --scale api=3
Memory Leaks
Symptoms:
- Memory usage increasing over time
- Out of memory errors
- Container restarts
Diagnosis:
Monitor memory usage:
# Check container memory docker stats # Check process memory ps aux | grep pythonFind memory leaks:
# Use memory-profiler pip install memory-profiler python -m memory_profiler script.py
Solutions:
Fix circular references:
# Use weak references import weakref class Parent: def __init__(self): self.children = weakref.WeakSet()Clear caches:
# Periodically clear caches from src.services.cache.advanced_cache import CacheInvalidator await CacheInvalidator.invalidate_by_pattern("*")Increase memory limits:
# In docker-compose.yml deploy: resources: limits: memory: 4G
API Errors
422 Validation Errors
Symptoms:
{"detail": [...]}with validation errors- Requests rejected with status 422
Common causes:
Missing required fields:
// Wrong {"biomarkers": {}} // Right {"biomarkers": {"Glucose": 100}}Invalid data types:
// Wrong {"biomarkers": {"Glucose": "high"}} // Right {"biomarkers": {"Glucose": 150}}Out of range values:
// Check API docs for valid ranges curl http://localhost:8000/docs
500 Internal Server Errors
Symptoms:
- Generic error messages
- Stack traces in logs
Diagnosis:
Check application logs:
docker-compose logs -f api | grep ERROREnable debug mode:
export DEBUG=true uvicorn src.main:app --reload
Common causes:
| Error | Solution |
|---|---|
| Database connection lost | Restart database services |
| External service down | Check service health endpoints |
| Memory error | Increase memory or optimize code |
| Configuration error | Verify environment variables |
503 Service Unavailable
Symptoms:
- Service temporarily unavailable
- Health check failures
Solutions:
Check service dependencies:
curl http://localhost:8000/health/detailedRestart affected services:
docker-compose restartCheck rate limits:
# Check rate limit headers curl -I http://localhost:8000/analyze/structured
Database Issues
OpenSearch Index Problems
Symptoms:
- Search returning no results
- Index not found errors
- Mapping errors
Diagnosis:
Check index status:
curl -X GET "localhost:9200/_cat/indices?v"Verify mapping:
curl -X GET "localhost:9200/medical_chunks/_mapping?pretty"
Solutions:
Recreate index:
# Delete and recreate curl -X DELETE "localhost:9200/medical_chunks" # Restart application to recreateFix mapping:
# Update index config from src.services.opensearch.index_config import MEDICAL_CHUNKS_MAPPING client.ensure_index(MEDICAL_CHUNKS_MAPPING)
Data Corruption
Symptoms:
- Inconsistent search results
- Missing documents
- Strange query behavior
Solutions:
Verify data integrity:
# Count documents curl -X GET "localhost:9200/medical_chunks/_count"Reindex data:
# Use indexing service from src.services.indexing.service import IndexingService service = IndexingService() await service.reindex_all()
Logging and Monitoring
Enable Debug Logging
Set log level:
export LOG_LEVEL=DEBUG export LOG_TO_FILE=trueView logs:
# Real-time logs tail -f data/logs/mediguard.log # Filter by level grep "ERROR" data/logs/mediguard.log
Monitor Metrics
Check Prometheus metrics:
curl http://localhost:8000/metrics | grep http_View Grafana dashboard:
- Navigate to http://localhost:3000
- Import
monitoring/grafana-dashboard.json
Performance Profiling
- Enable profiling:
# Add to main.py from pyinstrument import Profiler @app.middleware("http") async def profile_requests(request: Request, call_next): profiler = Profiler() profiler.start() response = await call_next(request) profiler.stop() print(profiler.output_text(unicode=True, color=True)) return response
Common Error Messages
"Service unavailable" in logs
Meaning: A required service (OpenSearch, Redis, etc.) is not responding.
Fix:
- Check service status:
docker-compose ps - Restart service:
docker-compose restart <service> - Check logs:
docker-compose logs <service>
"Rate limit exceeded"
Meaning: Too many requests from a client.
Fix:
- Wait and retry
- Check
Retry-Afterheader - Implement client-side rate limiting
"Invalid token" or "Authentication failed"
Meaning: Invalid API key or token.
Fix:
- Verify API key is correct
- Check token hasn't expired
- Ensure proper header format:
Authorization: Bearer <token>
"Query too large" or "Request entity too large"
Meaning: Request exceeds size limits.
Fix:
- Reduce request size
- Use pagination
- Increase limits in configuration
"Connection pool exhausted"
Meaning: Too many concurrent database connections.
Fix:
- Increase pool size
- Add connection timeout
- Implement request queuing
Emergency Procedures
Full System Recovery
# 1. Stop all services
docker-compose down
# 2. Clear corrupted data (WARNING: This deletes data!)
docker volume rm agentic-ragbot_opensearch_data
docker volume rm agentic-ragbot_redis_data
# 3. Restart with fresh data
docker-compose up -d
# 4. Wait for services to be ready
sleep 30
# 5. Verify health
curl http://localhost:8000/health/detailed
Backup and Restore
# Backup OpenSearch
curl -X POST "localhost:9200/_snapshot/backup/snapshot_1"
# Backup Redis
docker-compose exec redis redis-cli BGSAVE
# Restore from backup
# See DEPLOYMENT.md for detailed instructions
Performance Emergency
# 1. Scale up services
docker-compose up -d --scale api=5
# 2. Clear all caches
curl -X DELETE http://localhost:8000/admin/cache/clear
# 3. Enable emergency mode
export EMERGENCY_MODE=true
# This disables non-essential features
Getting Help
- Check logs first: Always check application logs for error details
- Search issues: Look for similar issues in GitHub
- Collect information:
- Error messages
- Logs
- System specs
- Steps to reproduce
- Create issue: Include all relevant information in GitHub issue
Contact Information
- Documentation: Check
/docsdirectory - Issues: GitHub Issues
- Emergency: Check DEPLOYMENT.md for emergency contacts