| # API Enhancements - Logging, Profiling & Error Handling | |
| This document describes the comprehensive enhancements made to the YLFF API endpoints for robust logging, profiling, and error handling. | |
| ## Overview | |
| All API endpoints have been enhanced with: | |
| - **Comprehensive logging** with structured data | |
| - **Request/response tracking** with unique request IDs | |
| - **Error handling** with detailed error information | |
| - **Profiling integration** for performance monitoring | |
| - **Timing information** for all operations | |
| - **Structured error responses** with error types and details | |
| ## Components | |
| ### 1. Request Logging Middleware | |
| A custom middleware (`RequestLoggingMiddleware`) logs all HTTP requests and responses: | |
| - Generates unique request IDs for tracking | |
| - Logs request start (method, path, client IP, query params) | |
| - Logs response completion (status code, duration) | |
| - Adds request ID to response headers | |
| - Handles exceptions and logs errors | |
| ### 2. Enhanced Error Handling | |
| #### Exception Handlers | |
| 1. **ValidationError Handler**: Catches Pydantic validation errors | |
| - Returns 422 status code | |
| - Includes detailed validation error messages | |
| - Logs validation failures | |
| 2. **General Exception Handler**: Catches all unhandled exceptions | |
| - Returns 500 status code | |
| - Logs full exception traceback | |
| - Returns structured error response with request ID | |
| #### Error Types Handled | |
| - `FileNotFoundError` β 404 with descriptive message | |
| - `PermissionError` β 403 with descriptive message | |
| - `ValueError` β 400 with validation details | |
| - `HTTPException` β Respects FastAPI HTTP exceptions | |
| - `Exception` β 500 with structured error response | |
| ### 3. Enhanced CLI Command Execution | |
| The `run_cli_command` function now includes: | |
| - **Comprehensive logging**: Logs command start, completion, and failures | |
| - **Execution timing**: Tracks duration of all commands | |
| - **Error classification**: Identifies error types (Exit codes, KeyboardInterrupt, Exceptions) | |
| - **Traceback capture**: Captures full stack traces for debugging | |
| - **Output capture**: Captures stdout/stderr with length tracking | |
| ### 4. Background Task Enhancement | |
| All background tasks (validation, training, etc.) now include: | |
| - **Pre-execution validation**: Validates input paths and parameters | |
| - **Structured logging**: Logs job start, progress, and completion | |
| - **Error context**: Captures error type, message, and traceback | |
| - **Job metadata**: Tracks duration, timestamps, and request parameters | |
| - **Profiling integration**: Automatic profiling context for long-running tasks | |
| ### 5. Request ID Tracking | |
| Every request gets a unique request ID: | |
| - Generated automatically if not provided in `X-Request-ID` header | |
| - Included in all log entries | |
| - Added to response headers | |
| - Used for correlating logs across distributed systems | |
| ## Logging Structure | |
| ### Log Levels | |
| - **INFO**: Normal operations, request/response logging, job status | |
| - **WARNING**: Validation errors, HTTP errors, non-fatal issues | |
| - **ERROR**: Exceptions, failures, critical errors | |
| - **DEBUG**: Detailed debugging information | |
| ### Structured Logging | |
| All logs use structured data with `extra` parameter: | |
| ```python | |
| logger.info( | |
| "Message", | |
| extra={ | |
| "request_id": "req_123", | |
| "job_id": "job_456", | |
| "duration_ms": 1234.5, | |
| "status_code": 200, | |
| # ... more context | |
| } | |
| ) | |
| ``` | |
| ## Example Enhanced Endpoint | |
| ### Before | |
| ```python | |
| @app.post("/api/v1/validate/sequence") | |
| async def validate_sequence(request: ValidateSequenceRequest): | |
| job_id = str(uuid.uuid4()) | |
| jobs[job_id] = {"status": "queued"} | |
| executor.submit(run_validation) | |
| return {"job_id": job_id} | |
| ``` | |
| ### After | |
| ```python | |
| @app.post("/api/v1/validate/sequence", response_model=JobResponse) | |
| async def validate_sequence( | |
| request: ValidateSequenceRequest, | |
| background_tasks: BackgroundTasks, | |
| fastapi_request: Request | |
| ): | |
| request_id = fastapi_request.headers.get('X-Request-ID', 'unknown') | |
| job_id = str(uuid.uuid4()) | |
| logger.info( | |
| f"Received sequence validation request", | |
| extra={"request_id": request_id, "job_id": job_id, ...} | |
| ) | |
| # Validate input | |
| seq_path = Path(request.sequence_dir) | |
| if not seq_path.exists(): | |
| logger.warning(...) | |
| raise HTTPException(status_code=400, detail=...) | |
| jobs[job_id] = { | |
| "status": "queued", | |
| "request_id": request_id, | |
| "created_at": time.time(), | |
| "request_params": {...} | |
| } | |
| try: | |
| executor.submit(run_validation) | |
| logger.info("Job queued successfully", ...) | |
| return JobResponse(job_id=job_id, status="queued", ...) | |
| except Exception as e: | |
| logger.error("Failed to queue job", ...) | |
| raise HTTPException(status_code=500, detail=...) | |
| ``` | |
| ## Background Task Function Enhancement | |
| ### Before | |
| ```python | |
| def run_validation(): | |
| try: | |
| result = run_cli_command(...) | |
| jobs[job_id]["status"] = "completed" if result["success"] else "failed" | |
| except Exception as e: | |
| jobs[job_id]["status"] = "failed" | |
| ``` | |
| ### After | |
| ```python | |
| def run_validation(): | |
| logger.info(f"Starting validation job: {job_id}", ...) | |
| try: | |
| # Pre-validation | |
| if not seq_path.exists(): | |
| raise FileNotFoundError(...) | |
| # Execute with profiling | |
| with profile_context(...): | |
| result = run_cli_command(...) | |
| # Update job with metadata | |
| jobs[job_id]["duration"] = result.get("duration") | |
| jobs[job_id]["completed_at"] = time.time() | |
| if result["success"]: | |
| logger.info("Job completed successfully", ...) | |
| jobs[job_id]["status"] = "completed" | |
| else: | |
| logger.error("Job failed", ...) | |
| jobs[job_id]["status"] = "failed" | |
| except FileNotFoundError as e: | |
| logger.error("File not found", exc_info=True) | |
| jobs[job_id]["status"] = "failed" | |
| jobs[job_id]["result"] = {"error": str(e), "error_type": "FileNotFoundError"} | |
| except Exception as e: | |
| logger.error("Unexpected error", exc_info=True) | |
| jobs[job_id]["status"] = "failed" | |
| jobs[job_id]["result"] = { | |
| "error": str(e), | |
| "error_type": type(e).__name__, | |
| "traceback": traceback.format_exc() | |
| } | |
| ``` | |
| ## Error Response Format | |
| All errors return structured JSON: | |
| ```json | |
| { | |
| "error": "ErrorType", | |
| "message": "Human-readable message", | |
| "request_id": "req_123", | |
| "details": {...} // Optional additional details | |
| } | |
| ``` | |
| ### Error Types | |
| - `ValidationError`: Pydantic validation failures (422) | |
| - `FileNotFoundError`: Missing files/directories (404) | |
| - `PermissionError`: Access denied (403) | |
| - `InternalServerError`: Unexpected errors (500) | |
| ## Profiling Integration | |
| ### Automatic Profiling | |
| Endpoints automatically profile when profiler is enabled: | |
| - API endpoint execution | |
| - Background task execution | |
| - CLI command execution | |
| ### Manual Profiling | |
| Use `profile_context` for custom profiling: | |
| ```python | |
| with profile_context(stage="validation", job_id=job_id): | |
| result = run_validation() | |
| ``` | |
| ## Benefits | |
| 1. **Debugging**: Full tracebacks and context in logs | |
| 2. **Monitoring**: Request IDs enable log correlation | |
| 3. **Performance**: Timing information for all operations | |
| 4. **Reliability**: Comprehensive error handling prevents crashes | |
| 5. **Observability**: Structured logs enable better analysis | |
| 6. **User Experience**: Clear, actionable error messages | |
| ## Usage | |
| ### Viewing Logs | |
| Logs are output to stdout/stderr and can be: | |
| - Viewed in RunPod logs | |
| - Collected by log aggregation services | |
| - Filtered by request_id for debugging | |
| ### Request ID | |
| Include `X-Request-ID` header for custom request tracking: | |
| ```bash | |
| curl -H "X-Request-ID: my-custom-id" https://api.example.com/health | |
| ``` | |
| ### Error Handling | |
| All errors are logged with full context, so you can: | |
| 1. Find the request_id from the error response | |
| 2. Search logs for that request_id | |
| 3. See the full execution trace and error details | |
| ## Future Enhancements | |
| - [ ] Add rate limiting with logging | |
| - [ ] Add request/response size limits | |
| - [ ] Add metrics export (Prometheus) | |
| - [ ] Add distributed tracing support | |
| - [ ] Add structured error codes | |
| - [ ] Add retry logic with exponential backoff | |