Spaces:

azan888
/

3d_model

Sleeping

App Files Files Community

3d_model / docs /API_ENHANCEMENTS.md

Azan

Clean deployment build (Squashed)

7a87926 3 days ago

preview code

raw

history blame contribute delete

8.24 kB

API Enhancements - Logging, Profiling & Error Handling

This document describes the comprehensive enhancements made to the YLFF API endpoints for robust logging, profiling, and error handling.

Overview

All API endpoints have been enhanced with:

Comprehensive logging with structured data
Request/response tracking with unique request IDs
Error handling with detailed error information
Profiling integration for performance monitoring
Timing information for all operations
Structured error responses with error types and details

Components

1. Request Logging Middleware

A custom middleware (RequestLoggingMiddleware) logs all HTTP requests and responses:

Generates unique request IDs for tracking
Logs request start (method, path, client IP, query params)
Logs response completion (status code, duration)
Adds request ID to response headers
Handles exceptions and logs errors

2. Enhanced Error Handling

Exception Handlers

ValidationError Handler: Catches Pydantic validation errors
- Returns 422 status code
- Includes detailed validation error messages
- Logs validation failures
General Exception Handler: Catches all unhandled exceptions
- Returns 500 status code
- Logs full exception traceback
- Returns structured error response with request ID

Error Types Handled

FileNotFoundError → 404 with descriptive message
PermissionError → 403 with descriptive message
ValueError → 400 with validation details
HTTPException → Respects FastAPI HTTP exceptions
Exception → 500 with structured error response

3. Enhanced CLI Command Execution

The run_cli_command function now includes:

Comprehensive logging: Logs command start, completion, and failures
Execution timing: Tracks duration of all commands
Error classification: Identifies error types (Exit codes, KeyboardInterrupt, Exceptions)
Traceback capture: Captures full stack traces for debugging
Output capture: Captures stdout/stderr with length tracking

4. Background Task Enhancement

All background tasks (validation, training, etc.) now include:

Pre-execution validation: Validates input paths and parameters
Structured logging: Logs job start, progress, and completion
Error context: Captures error type, message, and traceback
Job metadata: Tracks duration, timestamps, and request parameters
Profiling integration: Automatic profiling context for long-running tasks

5. Request ID Tracking

Every request gets a unique request ID:

Generated automatically if not provided in X-Request-ID header
Included in all log entries
Added to response headers
Used for correlating logs across distributed systems

Logging Structure

Log Levels

INFO: Normal operations, request/response logging, job status
WARNING: Validation errors, HTTP errors, non-fatal issues
ERROR: Exceptions, failures, critical errors
DEBUG: Detailed debugging information

Structured Logging

All logs use structured data with extra parameter:

logger.info(
    "Message",
    extra={
        "request_id": "req_123",
        "job_id": "job_456",
        "duration_ms": 1234.5,
        "status_code": 200,
        # ... more context
    }
)

Example Enhanced Endpoint

Before

@app.post("/api/v1/validate/sequence")
async def validate_sequence(request: ValidateSequenceRequest):
    job_id = str(uuid.uuid4())
    jobs[job_id] = {"status": "queued"}
    executor.submit(run_validation)
    return {"job_id": job_id}

After

@app.post("/api/v1/validate/sequence", response_model=JobResponse)
async def validate_sequence(
    request: ValidateSequenceRequest,
    background_tasks: BackgroundTasks,
    fastapi_request: Request
):
    request_id = fastapi_request.headers.get('X-Request-ID', 'unknown')
    job_id = str(uuid.uuid4())

    logger.info(
        f"Received sequence validation request",
        extra={"request_id": request_id, "job_id": job_id, ...}
    )

    # Validate input
    seq_path = Path(request.sequence_dir)
    if not seq_path.exists():
        logger.warning(...)
        raise HTTPException(status_code=400, detail=...)

    jobs[job_id] = {
        "status": "queued",
        "request_id": request_id,
        "created_at": time.time(),
        "request_params": {...}
    }

    try:
        executor.submit(run_validation)
        logger.info("Job queued successfully", ...)
        return JobResponse(job_id=job_id, status="queued", ...)
    except Exception as e:
        logger.error("Failed to queue job", ...)
        raise HTTPException(status_code=500, detail=...)

Background Task Function Enhancement

Before

def run_validation():
    try:
        result = run_cli_command(...)
        jobs[job_id]["status"] = "completed" if result["success"] else "failed"
    except Exception as e:
        jobs[job_id]["status"] = "failed"

After

def run_validation():
    logger.info(f"Starting validation job: {job_id}", ...)

    try:
        # Pre-validation
        if not seq_path.exists():
            raise FileNotFoundError(...)

        # Execute with profiling
        with profile_context(...):
            result = run_cli_command(...)

        # Update job with metadata
        jobs[job_id]["duration"] = result.get("duration")
        jobs[job_id]["completed_at"] = time.time()

        if result["success"]:
            logger.info("Job completed successfully", ...)
            jobs[job_id]["status"] = "completed"
        else:
            logger.error("Job failed", ...)
            jobs[job_id]["status"] = "failed"

    except FileNotFoundError as e:
        logger.error("File not found", exc_info=True)
        jobs[job_id]["status"] = "failed"
        jobs[job_id]["result"] = {"error": str(e), "error_type": "FileNotFoundError"}
    except Exception as e:
        logger.error("Unexpected error", exc_info=True)
        jobs[job_id]["status"] = "failed"
        jobs[job_id]["result"] = {
            "error": str(e),
            "error_type": type(e).__name__,
            "traceback": traceback.format_exc()
        }

Error Response Format

All errors return structured JSON:

{
    "error": "ErrorType",
    "message": "Human-readable message",
    "request_id": "req_123",
    "details": {...}  // Optional additional details
}

Error Types

ValidationError: Pydantic validation failures (422)
FileNotFoundError: Missing files/directories (404)
PermissionError: Access denied (403)
InternalServerError: Unexpected errors (500)

Profiling Integration

Automatic Profiling

Endpoints automatically profile when profiler is enabled:

API endpoint execution
Background task execution
CLI command execution

Manual Profiling

Use profile_context for custom profiling:

with profile_context(stage="validation", job_id=job_id):
    result = run_validation()

Benefits

Debugging: Full tracebacks and context in logs
Monitoring: Request IDs enable log correlation
Performance: Timing information for all operations
Reliability: Comprehensive error handling prevents crashes
Observability: Structured logs enable better analysis
User Experience: Clear, actionable error messages

Usage

Viewing Logs

Logs are output to stdout/stderr and can be:

Viewed in RunPod logs
Collected by log aggregation services
Filtered by request_id for debugging

Request ID

Include X-Request-ID header for custom request tracking:

curl -H "X-Request-ID: my-custom-id" https://api.example.com/health

Error Handling

All errors are logged with full context, so you can:

Find the request_id from the error response
Search logs for that request_id
See the full execution trace and error details

Future Enhancements

Add rate limiting with logging
Add request/response size limits
Add metrics export (Prometheus)
Add distributed tracing support
Add structured error codes
Add retry logic with exponential backoff