3d_model / docs /API_ENHANCEMENTS.md
Azan
Clean deployment build (Squashed)
7a87926

API Enhancements - Logging, Profiling & Error Handling

This document describes the comprehensive enhancements made to the YLFF API endpoints for robust logging, profiling, and error handling.

Overview

All API endpoints have been enhanced with:

  • Comprehensive logging with structured data
  • Request/response tracking with unique request IDs
  • Error handling with detailed error information
  • Profiling integration for performance monitoring
  • Timing information for all operations
  • Structured error responses with error types and details

Components

1. Request Logging Middleware

A custom middleware (RequestLoggingMiddleware) logs all HTTP requests and responses:

  • Generates unique request IDs for tracking
  • Logs request start (method, path, client IP, query params)
  • Logs response completion (status code, duration)
  • Adds request ID to response headers
  • Handles exceptions and logs errors

2. Enhanced Error Handling

Exception Handlers

  1. ValidationError Handler: Catches Pydantic validation errors

    • Returns 422 status code
    • Includes detailed validation error messages
    • Logs validation failures
  2. General Exception Handler: Catches all unhandled exceptions

    • Returns 500 status code
    • Logs full exception traceback
    • Returns structured error response with request ID

Error Types Handled

  • FileNotFoundError β†’ 404 with descriptive message
  • PermissionError β†’ 403 with descriptive message
  • ValueError β†’ 400 with validation details
  • HTTPException β†’ Respects FastAPI HTTP exceptions
  • Exception β†’ 500 with structured error response

3. Enhanced CLI Command Execution

The run_cli_command function now includes:

  • Comprehensive logging: Logs command start, completion, and failures
  • Execution timing: Tracks duration of all commands
  • Error classification: Identifies error types (Exit codes, KeyboardInterrupt, Exceptions)
  • Traceback capture: Captures full stack traces for debugging
  • Output capture: Captures stdout/stderr with length tracking

4. Background Task Enhancement

All background tasks (validation, training, etc.) now include:

  • Pre-execution validation: Validates input paths and parameters
  • Structured logging: Logs job start, progress, and completion
  • Error context: Captures error type, message, and traceback
  • Job metadata: Tracks duration, timestamps, and request parameters
  • Profiling integration: Automatic profiling context for long-running tasks

5. Request ID Tracking

Every request gets a unique request ID:

  • Generated automatically if not provided in X-Request-ID header
  • Included in all log entries
  • Added to response headers
  • Used for correlating logs across distributed systems

Logging Structure

Log Levels

  • INFO: Normal operations, request/response logging, job status
  • WARNING: Validation errors, HTTP errors, non-fatal issues
  • ERROR: Exceptions, failures, critical errors
  • DEBUG: Detailed debugging information

Structured Logging

All logs use structured data with extra parameter:

logger.info(
    "Message",
    extra={
        "request_id": "req_123",
        "job_id": "job_456",
        "duration_ms": 1234.5,
        "status_code": 200,
        # ... more context
    }
)

Example Enhanced Endpoint

Before

@app.post("/api/v1/validate/sequence")
async def validate_sequence(request: ValidateSequenceRequest):
    job_id = str(uuid.uuid4())
    jobs[job_id] = {"status": "queued"}
    executor.submit(run_validation)
    return {"job_id": job_id}

After

@app.post("/api/v1/validate/sequence", response_model=JobResponse)
async def validate_sequence(
    request: ValidateSequenceRequest,
    background_tasks: BackgroundTasks,
    fastapi_request: Request
):
    request_id = fastapi_request.headers.get('X-Request-ID', 'unknown')
    job_id = str(uuid.uuid4())

    logger.info(
        f"Received sequence validation request",
        extra={"request_id": request_id, "job_id": job_id, ...}
    )

    # Validate input
    seq_path = Path(request.sequence_dir)
    if not seq_path.exists():
        logger.warning(...)
        raise HTTPException(status_code=400, detail=...)

    jobs[job_id] = {
        "status": "queued",
        "request_id": request_id,
        "created_at": time.time(),
        "request_params": {...}
    }

    try:
        executor.submit(run_validation)
        logger.info("Job queued successfully", ...)
        return JobResponse(job_id=job_id, status="queued", ...)
    except Exception as e:
        logger.error("Failed to queue job", ...)
        raise HTTPException(status_code=500, detail=...)

Background Task Function Enhancement

Before

def run_validation():
    try:
        result = run_cli_command(...)
        jobs[job_id]["status"] = "completed" if result["success"] else "failed"
    except Exception as e:
        jobs[job_id]["status"] = "failed"

After

def run_validation():
    logger.info(f"Starting validation job: {job_id}", ...)

    try:
        # Pre-validation
        if not seq_path.exists():
            raise FileNotFoundError(...)

        # Execute with profiling
        with profile_context(...):
            result = run_cli_command(...)

        # Update job with metadata
        jobs[job_id]["duration"] = result.get("duration")
        jobs[job_id]["completed_at"] = time.time()

        if result["success"]:
            logger.info("Job completed successfully", ...)
            jobs[job_id]["status"] = "completed"
        else:
            logger.error("Job failed", ...)
            jobs[job_id]["status"] = "failed"

    except FileNotFoundError as e:
        logger.error("File not found", exc_info=True)
        jobs[job_id]["status"] = "failed"
        jobs[job_id]["result"] = {"error": str(e), "error_type": "FileNotFoundError"}
    except Exception as e:
        logger.error("Unexpected error", exc_info=True)
        jobs[job_id]["status"] = "failed"
        jobs[job_id]["result"] = {
            "error": str(e),
            "error_type": type(e).__name__,
            "traceback": traceback.format_exc()
        }

Error Response Format

All errors return structured JSON:

{
    "error": "ErrorType",
    "message": "Human-readable message",
    "request_id": "req_123",
    "details": {...}  // Optional additional details
}

Error Types

  • ValidationError: Pydantic validation failures (422)
  • FileNotFoundError: Missing files/directories (404)
  • PermissionError: Access denied (403)
  • InternalServerError: Unexpected errors (500)

Profiling Integration

Automatic Profiling

Endpoints automatically profile when profiler is enabled:

  • API endpoint execution
  • Background task execution
  • CLI command execution

Manual Profiling

Use profile_context for custom profiling:

with profile_context(stage="validation", job_id=job_id):
    result = run_validation()

Benefits

  1. Debugging: Full tracebacks and context in logs
  2. Monitoring: Request IDs enable log correlation
  3. Performance: Timing information for all operations
  4. Reliability: Comprehensive error handling prevents crashes
  5. Observability: Structured logs enable better analysis
  6. User Experience: Clear, actionable error messages

Usage

Viewing Logs

Logs are output to stdout/stderr and can be:

  • Viewed in RunPod logs
  • Collected by log aggregation services
  • Filtered by request_id for debugging

Request ID

Include X-Request-ID header for custom request tracking:

curl -H "X-Request-ID: my-custom-id" https://api.example.com/health

Error Handling

All errors are logged with full context, so you can:

  1. Find the request_id from the error response
  2. Search logs for that request_id
  3. See the full execution trace and error details

Future Enhancements

  • Add rate limiting with logging
  • Add request/response size limits
  • Add metrics export (Prometheus)
  • Add distributed tracing support
  • Add structured error codes
  • Add retry logic with exponential backoff