Spaces:

azan888
/

3d_model

Sleeping

App Files Files Community

3d_model / docs /API_ENHANCEMENTS.md

Azan

Clean deployment build (Squashed)

7a87926 3 days ago

preview code

raw

history blame contribute delete

8.24 kB

	# API Enhancements - Logging, Profiling & Error Handling

	This document describes the comprehensive enhancements made to the YLFF API endpoints for robust logging, profiling, and error handling.

	## Overview

	All API endpoints have been enhanced with:

	- Comprehensive logging with structured data
	- Request/response tracking with unique request IDs
	- Error handling with detailed error information
	- Profiling integration for performance monitoring
	- Timing information for all operations
	- Structured error responses with error types and details

	## Components

	### 1. Request Logging Middleware

	A custom middleware (`RequestLoggingMiddleware`) logs all HTTP requests and responses:

	- Generates unique request IDs for tracking
	- Logs request start (method, path, client IP, query params)
	- Logs response completion (status code, duration)
	- Adds request ID to response headers
	- Handles exceptions and logs errors

	### 2. Enhanced Error Handling

	#### Exception Handlers

	1. ValidationError Handler: Catches Pydantic validation errors

	- Returns 422 status code
	- Includes detailed validation error messages
	- Logs validation failures

	2. General Exception Handler: Catches all unhandled exceptions
	- Returns 500 status code
	- Logs full exception traceback
	- Returns structured error response with request ID

	#### Error Types Handled

	- `FileNotFoundError` → 404 with descriptive message
	- `PermissionError` → 403 with descriptive message
	- `ValueError` → 400 with validation details
	- `HTTPException` → Respects FastAPI HTTP exceptions
	- `Exception` → 500 with structured error response

	### 3. Enhanced CLI Command Execution

	The `run_cli_command` function now includes:

	- Comprehensive logging: Logs command start, completion, and failures
	- Execution timing: Tracks duration of all commands
	- Error classification: Identifies error types (Exit codes, KeyboardInterrupt, Exceptions)
	- Traceback capture: Captures full stack traces for debugging
	- Output capture: Captures stdout/stderr with length tracking

	### 4. Background Task Enhancement

	All background tasks (validation, training, etc.) now include:

	- Pre-execution validation: Validates input paths and parameters
	- Structured logging: Logs job start, progress, and completion
	- Error context: Captures error type, message, and traceback
	- Job metadata: Tracks duration, timestamps, and request parameters
	- Profiling integration: Automatic profiling context for long-running tasks

	### 5. Request ID Tracking

	Every request gets a unique request ID:

	- Generated automatically if not provided in `X-Request-ID` header
	- Included in all log entries
	- Added to response headers
	- Used for correlating logs across distributed systems

	## Logging Structure

	### Log Levels

	- INFO: Normal operations, request/response logging, job status
	- WARNING: Validation errors, HTTP errors, non-fatal issues
	- ERROR: Exceptions, failures, critical errors
	- DEBUG: Detailed debugging information

	### Structured Logging

	All logs use structured data with `extra` parameter:

	```python
	logger.info(
	"Message",
	extra={
	"request_id": "req_123",
	"job_id": "job_456",
	"duration_ms": 1234.5,
	"status_code": 200,
	# ... more context
	}
	)
	```

	## Example Enhanced Endpoint

	### Before

	```python
	@app.post("/api/v1/validate/sequence")
	async def validate_sequence(request: ValidateSequenceRequest):
	job_id = str(uuid.uuid4())
	jobs[job_id] = {"status": "queued"}
	executor.submit(run_validation)
	return {"job_id": job_id}
	```

	### After

	```python
	@app.post("/api/v1/validate/sequence", response_model=JobResponse)
	async def validate_sequence(
	request: ValidateSequenceRequest,
	background_tasks: BackgroundTasks,
	fastapi_request: Request
	):
	request_id = fastapi_request.headers.get('X-Request-ID', 'unknown')
	job_id = str(uuid.uuid4())

	logger.info(
	f"Received sequence validation request",
	extra={"request_id": request_id, "job_id": job_id, ...}
	)

	# Validate input
	seq_path = Path(request.sequence_dir)
	if not seq_path.exists():
	logger.warning(...)
	raise HTTPException(status_code=400, detail=...)

	jobs[job_id] = {
	"status": "queued",
	"request_id": request_id,
	"created_at": time.time(),
	"request_params": {...}
	}

	try:
	executor.submit(run_validation)
	logger.info("Job queued successfully", ...)
	return JobResponse(job_id=job_id, status="queued", ...)
	except Exception as e:
	logger.error("Failed to queue job", ...)
	raise HTTPException(status_code=500, detail=...)
	```

	## Background Task Function Enhancement

	### Before

	```python
	def run_validation():
	try:
	result = run_cli_command(...)
	jobs[job_id]["status"] = "completed" if result["success"] else "failed"
	except Exception as e:
	jobs[job_id]["status"] = "failed"
	```

	### After

	```python
	def run_validation():
	logger.info(f"Starting validation job: {job_id}", ...)

	try:
	# Pre-validation
	if not seq_path.exists():
	raise FileNotFoundError(...)

	# Execute with profiling
	with profile_context(...):
	result = run_cli_command(...)

	# Update job with metadata
	jobs[job_id]["duration"] = result.get("duration")
	jobs[job_id]["completed_at"] = time.time()

	if result["success"]:
	logger.info("Job completed successfully", ...)
	jobs[job_id]["status"] = "completed"
	else:
	logger.error("Job failed", ...)
	jobs[job_id]["status"] = "failed"

	except FileNotFoundError as e:
	logger.error("File not found", exc_info=True)
	jobs[job_id]["status"] = "failed"
	jobs[job_id]["result"] = {"error": str(e), "error_type": "FileNotFoundError"}
	except Exception as e:
	logger.error("Unexpected error", exc_info=True)
	jobs[job_id]["status"] = "failed"
	jobs[job_id]["result"] = {
	"error": str(e),
	"error_type": type(e).__name__,
	"traceback": traceback.format_exc()
	}
	```

	## Error Response Format

	All errors return structured JSON:

	```json
	{
	"error": "ErrorType",
	"message": "Human-readable message",
	"request_id": "req_123",
	"details": {...} // Optional additional details
	}
	```

	### Error Types

	- `ValidationError`: Pydantic validation failures (422)
	- `FileNotFoundError`: Missing files/directories (404)
	- `PermissionError`: Access denied (403)
	- `InternalServerError`: Unexpected errors (500)

	## Profiling Integration

	### Automatic Profiling

	Endpoints automatically profile when profiler is enabled:

	- API endpoint execution
	- Background task execution
	- CLI command execution

	### Manual Profiling

	Use `profile_context` for custom profiling:

	```python
	with profile_context(stage="validation", job_id=job_id):
	result = run_validation()
	```

	## Benefits

	1. Debugging: Full tracebacks and context in logs
	2. Monitoring: Request IDs enable log correlation
	3. Performance: Timing information for all operations
	4. Reliability: Comprehensive error handling prevents crashes
	5. Observability: Structured logs enable better analysis
	6. User Experience: Clear, actionable error messages

	## Usage

	### Viewing Logs

	Logs are output to stdout/stderr and can be:

	- Viewed in RunPod logs
	- Collected by log aggregation services
	- Filtered by request_id for debugging

	### Request ID

	Include `X-Request-ID` header for custom request tracking:

	```bash
	curl -H "X-Request-ID: my-custom-id" https://api.example.com/health
	```

	### Error Handling

	All errors are logged with full context, so you can:

	1. Find the request_id from the error response
	2. Search logs for that request_id
	3. See the full execution trace and error details

	## Future Enhancements

	- [ ] Add rate limiting with logging
	- [ ] Add request/response size limits
	- [ ] Add metrics export (Prometheus)
	- [ ] Add distributed tracing support
	- [ ] Add structured error codes
	- [ ] Add retry logic with exponential backoff