Spaces:
Sleeping
Monitoring Memory Usage in Production on Render
This document provides guidance on monitoring memory usage in production for the RAG application deployed on Render's free tier, which has a 512MB memory limit.
Integrated Memory Monitoring Tools
The application includes enhanced memory monitoring specifically optimized for Render deployments:
1. Memory Status Endpoint
The application exposes a dedicated endpoint for monitoring memory usage:
GET /memory/render-status
This endpoint returns detailed information about current memory usage, including:
- Current memory usage in MB
- Peak memory usage since startup
- Memory usage trends (5-minute and 1-hour)
- Current memory status (normal, warning, critical, emergency)
- Actions taken if memory thresholds were exceeded
Example response:
{
"status": "success",
"is_render": true,
"memory_status": {
"timestamp": "2023-10-25T14:32:15.123456",
"memory_mb": 342.5,
"peak_memory_mb": 398.2,
"context": "api_request",
"status": "warning",
"action_taken": "light_cleanup",
"memory_limit_mb": 512.0
},
"memory_trends": {
"current_mb": 342.5,
"peak_mb": 398.2,
"samples_count": 356,
"trend_5min_mb": 12.5,
"trend_1hour_mb": -24.3
},
"render_limit_mb": 512
}
2. Detailed Diagnostics
For more detailed memory diagnostics, use:
GET /memory/diagnostics
This provides a deeper look at memory allocation and usage patterns.
3. Force Memory Cleanup
If you notice memory usage approaching critical levels, use diagnostics and consider scheduled maintenance windows for cleanup or service restarts. Manual force-clean endpoints were removed in favor of safer, observable operations.
Setting Up External Monitoring
Using Uptime Robot or Similar Services
- Set up a monitor to check the
/healthendpoint every 5 minutes - Set up a separate monitor to check the
/memory/render-statusendpoint every 15 minutes
Automated Alerting
Configure alerts based on memory thresholds:
- Warning Alert: When memory usage exceeds 400MB (78% of limit)
- Critical Alert: When memory usage exceeds 450MB (88% of limit)
Monitoring Logs in Render Dashboard
- Log into your Render dashboard
- Navigate to the service logs
- Filter for memory-related log messages:
[MEMORY CHECKPOINT][MEMORY MILESTONE]Memory usageWARNING: Memory usageCRITICAL: Memory usage
Memory Usage Patterns to Watch For
Warning Signs
- Steadily Increasing Memory: If memory trends show continuous growth
- High Peak After Ingestion: Memory spikes above 450MB after document ingestion
- Failure to Release Memory: Memory doesn't decrease after operations complete
Preventative Actions
- Regular Cleanup: Schedule low-traffic time for calling
/memory/force-clean - Batch Processing: For large document sets, ingest in smaller batches
- Monitoring Before Bulk Operations: Check memory status before starting resource-intensive operations
Memory Optimization Features
The application includes several memory optimization features:
- Automatic Thresholds: Memory is monitored against configured thresholds (400MB, 450MB, 480MB)
- Progressive Cleanup: Different levels of cleanup based on severity
- Request Circuit Breaker: Will reject new requests if memory is critically high
- Memory Metrics Export: Memory metrics are saved to
/tmp/render_metrics/for later analysis
Troubleshooting Memory Issues
If you encounter persistent memory issues:
- Review Logs: Check Render logs for memory checkpoints and milestones
- Analyze Trends: Use the
/memory/render-statusendpoint to identify patterns - Check Operations Timing: High memory could correlate with specific operations
- Adjust Configuration: Consider adjusting
EMBEDDING_BATCH_SIZEor other parameters inconfig.py
Available Environment Variables
These environment variables can be configured in Render:
MEMORY_DEBUG=1: Enable detailed memory diagnosticsMEMORY_LOG_INTERVAL=10: Log memory usage every 10 secondsENABLE_TRACEMALLOC=1: Enable tracemalloc for detailed memory allocation trackingRENDER=1: Enable Render-specific optimizations (automatically set on Render)