Spaces:

elmerzole
/

llm-api-proxy

Paused

Mirrowel commited on Oct 2, 2025

Commit

fc62d82

1 Parent(s): 1b7b9f5

feat(proxy): ✨ implement detailed per-request transaction logging

The new `DetailedLogger` captures comprehensive information for each API transaction, replacing the previous `log_request_response` function.

- Creates a unique, timestamped directory for every incoming request.
- Logs initial request headers and body to a `request.json` file.
- Appends individual chunks for streaming responses to a `streaming_chunks.jsonl` file.
- Records the complete final API response (status, headers, body, duration) to a `final_response.json` file.
- Generates `metadata.json` with a summary including model, token usage, finish reason, and extracted reasoning.

Files changed (6) hide show

.gitignore +1 -0
DOCUMENTATION.md +21 -2
README.md +4 -4
src/proxy_app/detailed_logger.py +121 -0
src/proxy_app/main.py +30 -29
src/proxy_app/request_logger.py +0 -36

.gitignore CHANGED Viewed

@@ -124,3 +124,4 @@ test_proxy.py
 start_proxy.bat
 key_usage.json
 staged_changes.txt

 start_proxy.bat
 key_usage.json
 staged_changes.txt
+logs/

DOCUMENTATION.md CHANGED Viewed

@@ -165,9 +165,28 @@ For streaming requests, the `chat_completions` endpoint returns a `StreamingResp
 1.  It passes the chunks from the `RotatingClient`'s stream directly to the user.
 2.  It aggregates the full response in the background so that it can be logged completely once the stream is finished.
-### 3.2. `request_logger.py`
-This module provides the `log_request_response` function, which writes the request and response data to a timestamped JSON file in the `logs/` directory. It handles creating separate directories for `completions` and `embeddings`.
 ### 3.3. `build.py`

 1.  It passes the chunks from the `RotatingClient`'s stream directly to the user.
 2.  It aggregates the full response in the background so that it can be logged completely once the stream is finished.
+### 3.2. `detailed_logger.py` - Comprehensive Transaction Logging
+To facilitate robust debugging and performance analysis, the proxy includes a powerful detailed logging system, enabled by the `--enable-request-logging` command-line flag. This system is managed by the `DetailedLogger` class in `detailed_logger.py`.
+Unlike simple logging, this system creates a **unique directory for every single transaction**, ensuring that all related data is isolated and easy to analyze.
+#### Log Directory Structure
+When logging is enabled, each request will generate a new directory inside `logs/detailed_logs/` with a name like `YYYYMMDD_HHMMSS_unique-uuid`. Inside this directory, you will find a complete record of the transaction:
+-   **`request.json`**: Contains the full incoming request, including HTTP headers and the JSON body.
+-   **`streaming_chunks.jsonl`**: For streaming requests, this file contains a timestamped log of every individual data chunk received from the provider. This is invaluable for debugging malformed streams or partial responses.
+-   **`final_response.json`**: Contains the complete final response from the provider, including the status code, headers, and full JSON body. For streaming requests, this body is the fully reassembled message.
+-   **`metadata.json`**: A summary file for quick analysis, containing:
+    -   `request_id`: The unique identifier for the transaction.
+    -   `duration_ms`: The total time taken for the request to complete.
+    -   `status_code`: The final HTTP status code returned by the provider.
+    -   `model`: The model used for the request.
+    -   `usage`: Token usage statistics (`prompt`, `completion`, `total`).
+    -   `finish_reason`: The reason the model stopped generating tokens.
+    -   `reasoning_found`: A boolean indicating if a `reasoning` field was detected in the response.
+    -   `reasoning_content`: The extracted content of the `reasoning` field, if found.
 ### 3.3. `build.py`

README.md CHANGED Viewed

@@ -30,7 +30,7 @@ This project provides a powerful solution for developers building complex applic
 -   **Intelligent Key Management**: Optimizes request distribution across your pool of keys by selecting the best available one for each call.
 -   **Escalating Per-Model Cooldowns**: If a key fails for a specific model, it's placed on a temporary, escalating cooldown for that model, allowing it to be used with others.
 -   **Automatic Daily Resets**: Cooldowns and usage statistics are automatically reset daily, making the system self-maintaining.
--   **Request Logging**: Optional logging of full request and response payloads for easy debugging.
 -   **Provider Agnostic**: Compatible with any provider supported by `litellm`.
 -   **OpenAI-Compatible Proxy**: Offers a familiar API interface with additional endpoints for model and provider discovery.
@@ -238,7 +238,7 @@ The proxy server can be configured at runtime using the following command-line a
 -   `--host`: The IP address to bind the server to. Defaults to `0.0.0.0` (accessible from your local network).
 -   `--port`: The port to run the server on. Defaults to `8000`.
--   `--enable-request-logging`: A flag to enable logging of full request and response payloads to the `logs/` directory. This is useful for debugging.
 **Example:**
 ```bash
@@ -255,8 +255,8 @@ For convenience on Windows, you can use the provided `.bat` scripts in the root
 ### Troubleshooting
 -   **`401 Unauthorized`**: Ensure your `PROXY_API_KEY` is set correctly in the `.env` file and included in the `Authorization: Bearer <key>` header of your request.
--   **`500 Internal Server Error`**: Check the console logs of the `uvicorn` server for detailed error messages. This could indicate an issue with one of your provider API keys (e.g., it's invalid or has been revoked) or a problem with the provider's service.
--   **All keys on cooldown**: If you see a message that all keys are on cooldown, it means all your keys for a specific provider have recently failed. Check the `logs/` directory (if enabled) or the `key_usage.json` file for details on why the failures occurred.
 ---

 -   **Intelligent Key Management**: Optimizes request distribution across your pool of keys by selecting the best available one for each call.
 -   **Escalating Per-Model Cooldowns**: If a key fails for a specific model, it's placed on a temporary, escalating cooldown for that model, allowing it to be used with others.
 -   **Automatic Daily Resets**: Cooldowns and usage statistics are automatically reset daily, making the system self-maintaining.
+-   **Detailed Request Logging**: Enable comprehensive logging for debugging. Each request gets its own directory with full request/response details, streaming chunks, and performance metadata.
 -   **Provider Agnostic**: Compatible with any provider supported by `litellm`.
 -   **OpenAI-Compatible Proxy**: Offers a familiar API interface with additional endpoints for model and provider discovery.
 -   `--host`: The IP address to bind the server to. Defaults to `0.0.0.0` (accessible from your local network).
 -   `--port`: The port to run the server on. Defaults to `8000`.
+-   `--enable-request-logging`: A flag to enable detailed, per-request logging. When active, the proxy creates a unique directory for each transaction in the `logs/detailed_logs/` folder, containing the full request, response, streaming chunks, and performance metadata. This is highly recommended for debugging.
 **Example:**
 ```bash
 ### Troubleshooting
 -   **`401 Unauthorized`**: Ensure your `PROXY_API_KEY` is set correctly in the `.env` file and included in the `Authorization: Bearer <key>` header of your request.
+-   **`500 Internal Server Error`**: Check the console logs of the `uvicorn` server for detailed error messages. This could indicate an issue with one of your provider API keys (e.g., it's invalid or has been revoked) or a problem with the provider's service. If you have logging enabled (`--enable-request-logging`), inspect the `final_response.json` and `metadata.json` files in the corresponding log directory under `logs/detailed_logs/` for the specific error returned by the upstream provider.
+-   **All keys on cooldown**: If you see a message that all keys are on cooldown, it means all your keys for a specific provider have recently failed. If you have logging enabled (`--enable-request-logging`), check the `logs/detailed_logs/` directory to find the logs for the failed requests and inspect the `final_response.json` to see the underlying error from the provider.
 ---

src/proxy_app/detailed_logger.py ADDED Viewed

	@@ -0,0 +1,121 @@

+import json
+import time
+import uuid
+from datetime import datetime
+from pathlib import Path
+from typing import Any, Dict, Optional, List
+import logging
+LOGS_DIR = Path(__file__).resolve().parent.parent.parent / "logs"
+DETAILED_LOGS_DIR = LOGS_DIR / "detailed_logs"
+class DetailedLogger:
+    """
+    Logs comprehensive details of each API transaction to a unique, timestamped directory.
+    """
+    def __init__(self):
+        """
+        Initializes the logger for a single request, creating a unique directory to store all related log files.
+        """
+        self.start_time = time.time()
+        self.request_id = str(uuid.uuid4())
+        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
+        self.log_dir = DETAILED_LOGS_DIR / f"{timestamp}_{self.request_id}"
+        self.log_dir.mkdir(parents=True, exist_ok=True)
+        self.streaming = False
+    def _write_json(self, filename: str, data: Dict[str, Any]):
+        """Helper to write data to a JSON file in the log directory."""
+        try:
+            with open(self.log_dir / filename, "w", encoding="utf-8") as f:
+                json.dump(data, f, indent=4, ensure_ascii=False)
+        except Exception as e:
+            logging.error(f"[{self.request_id}] Failed to write to {filename}: {e}")
+    def log_request(self, headers: Dict[str, Any], body: Dict[str, Any]):
+        """Logs the initial request details."""
+        self.streaming = body.get("stream", False)
+        request_data = {
+            "request_id": self.request_id,
+            "timestamp_utc": datetime.utcnow().isoformat(),
+            "headers": dict(headers),
+            "body": body
+        }
+        self._write_json("request.json", request_data)
+    def log_stream_chunk(self, chunk: Dict[str, Any]):
+        """Logs an individual chunk from a streaming response to a JSON Lines file."""
+        try:
+            log_entry = {
+                "timestamp_utc": datetime.utcnow().isoformat(),
+                "chunk": chunk
+            }
+            with open(self.log_dir / "streaming_chunks.jsonl", "a", encoding="utf-8") as f:
+                f.write(json.dumps(log_entry, ensure_ascii=False) + "\n")
+        except Exception as e:
+            logging.error(f"[{self.request_id}] Failed to write stream chunk: {e}")
+    def log_final_response(self, status_code: int, headers: Optional[Dict[str, Any]], body: Dict[str, Any]):
+        """Logs the complete final response, either from a non-streaming call or after reassembling a stream."""
+        end_time = time.time()
+        duration_ms = (end_time - self.start_time) * 1000
+        response_data = {
+            "request_id": self.request_id,
+            "timestamp_utc": datetime.utcnow().isoformat(),
+            "status_code": status_code,
+            "duration_ms": round(duration_ms),
+            "headers": dict(headers) if headers else None,
+            "body": body
+        }
+        self._write_json("final_response.json", response_data)
+        self._log_metadata(response_data)
+    def _extract_reasoning(self, response_body: Dict[str, Any]) -> Optional[str]:
+        """Recursively searches for and extracts 'reasoning' fields from the response body."""
+        if not isinstance(response_body, dict):
+            return None
+        if "reasoning" in response_body:
+            return response_body["reasoning"]
+        if "choices" in response_body and response_body["choices"]:
+            message = response_body["choices"][0].get("message", {})
+            if "reasoning" in message:
+                return message["reasoning"]
+            if "reasoning_content" in message:
+                return message["reasoning_content"]
+        return None
+    def _log_metadata(self, response_data: Dict[str, Any]):
+        """Logs a summary of the transaction for quick analysis."""
+        usage = response_data.get("body", {}).get("usage", {})
+        model = response_data.get("body", {}).get("model", "N/A")
+        finish_reason = "N/A"
+        if "choices" in response_data.get("body", {}) and response_data["body"]["choices"]:
+            finish_reason = response_data["body"]["choices"][0].get("finish_reason", "N/A")
+        metadata = {
+            "request_id": self.request_id,
+            "timestamp_utc": response_data["timestamp_utc"],
+            "duration_ms": response_data["duration_ms"],
+            "status_code": response_data["status_code"],
+            "model": model,
+            "streaming": self.streaming,
+            "usage": {
+                "prompt_tokens": usage.get("prompt_tokens"),
+                "completion_tokens": usage.get("completion_tokens"),
+                "total_tokens": usage.get("total_tokens"),
+            },
+            "finish_reason": finish_reason,
+            "reasoning_found": False,
+            "reasoning_content": None
+        }
+        reasoning = self._extract_reasoning(response_data.get("body", {}))
+        if reasoning:
+            metadata["reasoning_found"] = True
+            metadata["reasoning_content"] = reasoning
+        self._write_json("metadata.json", metadata)

src/proxy_app/main.py CHANGED Viewed

@@ -52,8 +52,9 @@ args, _ = parser.parse_known_args()
 sys.path.append(str(Path(__file__).resolve().parent.parent))
 from rotator_library import RotatingClient, PROVIDER_PLUGINS
-from proxy_app.request_logger import log_request_response, log_request_to_console
 from proxy_app.batch_manager import EmbeddingBatcher
 # --- Logging Configuration ---
 LOG_DIR = Path(__file__).resolve().parent.parent / "logs"
@@ -206,7 +207,8 @@ async def verify_api_key(auth: str = Depends(api_key_header)):
 async def streaming_response_wrapper(
     request: Request,
     request_data: dict,
-    response_stream: AsyncGenerator[str, None]
 ) -> AsyncGenerator[str, None]:
     """
     Wraps a streaming response to log the full response after completion
@@ -227,8 +229,10 @@ async def streaming_response_wrapper(
                     try:
                         chunk_data = json.loads(content)
                         response_chunks.append(chunk_data)
                     except json.JSONDecodeError:
-                        pass  # Ignore non-JSON chunks
     except Exception as e:
         logging.error(f"An error occurred during the response stream: {e}")
         # Yield a final error message to the client to ensure they are not left hanging.
@@ -242,13 +246,8 @@ async def streaming_response_wrapper(
         yield f"data: {json.dumps(error_payload)}\n\n"
         yield "data: [DONE]\n\n"
         # Also log this as a failed request
-        if ENABLE_REQUEST_LOGGING:
-            log_request_response(
-                request_data=request_data,
-                response_data={"error": str(e)},
-                is_streaming=True,
-                log_type="completion"
-            )
         return # Stop further processing
     finally:
         if response_chunks:
@@ -341,12 +340,11 @@ async def streaming_response_wrapper(
                 "usage": usage_data
             }
-        if ENABLE_REQUEST_LOGGING:
-            log_request_response(
-                request_data=request_data,
-                response_data=full_response,
-                is_streaming=True,
-                log_type="completion"
             )
 @app.post("/v1/chat/completions")
@@ -359,8 +357,12 @@ async def chat_completions(
     OpenAI-compatible endpoint powered by the RotatingClient.
     Handles both streaming and non-streaming responses and logs them.
     """
     try:
         request_data = await request.json()
         log_request_to_console(
             url=str(request.url),
             headers=dict(request.headers),
@@ -372,17 +374,20 @@ async def chat_completions(
         if is_streaming:
             response_generator = client.acompletion(request=request, **request_data)
             return StreamingResponse(
-                streaming_response_wrapper(request, request_data, response_generator),
                 media_type="text/event-stream"
             )
         else:
             response = await client.acompletion(request=request, **request_data)
-            if ENABLE_REQUEST_LOGGING:
-                log_request_response(
-                    request_data=request_data,
-                    response_data=response.model_dump(),
-                    is_streaming=False,
-                    log_type="completion"
                 )
             return response
@@ -406,12 +411,8 @@ async def chat_completions(
                 request_data = await request.json()
             except json.JSONDecodeError:
                 request_data = {"error": "Could not parse request body"}
-            log_request_response(
-                request_data=request_data,
-                response_data={"error": str(e)},
-                is_streaming=request_data.get("stream", False),
-                log_type="completion"
-            )
         raise HTTPException(status_code=500, detail=str(e))
 @app.post("/v1/embeddings")

 sys.path.append(str(Path(__file__).resolve().parent.parent))
 from rotator_library import RotatingClient, PROVIDER_PLUGINS
+from proxy_app.request_logger import log_request_to_console
 from proxy_app.batch_manager import EmbeddingBatcher
+from proxy_app.detailed_logger import DetailedLogger
 # --- Logging Configuration ---
 LOG_DIR = Path(__file__).resolve().parent.parent / "logs"
 async def streaming_response_wrapper(
     request: Request,
     request_data: dict,
+    response_stream: AsyncGenerator[str, None],
+    logger: Optional[DetailedLogger] = None
 ) -> AsyncGenerator[str, None]:
     """
     Wraps a streaming response to log the full response after completion
                     try:
                         chunk_data = json.loads(content)
                         response_chunks.append(chunk_data)
+                        if logger:
+                            logger.log_stream_chunk(chunk_data)
                     except json.JSONDecodeError:
+                        pass
     except Exception as e:
         logging.error(f"An error occurred during the response stream: {e}")
         # Yield a final error message to the client to ensure they are not left hanging.
         yield f"data: {json.dumps(error_payload)}\n\n"
         yield "data: [DONE]\n\n"
         # Also log this as a failed request
+        if logger:
+            logger.log_final_response(status_code=500, headers=None, body={"error": str(e)})
         return # Stop further processing
     finally:
         if response_chunks:
                 "usage": usage_data
             }
+        if logger:
+            logger.log_final_response(
+                status_code=200,
+                headers=None,  # Headers are not available at this stage
+                body=full_response
             )
 @app.post("/v1/chat/completions")
     OpenAI-compatible endpoint powered by the RotatingClient.
     Handles both streaming and non-streaming responses and logs them.
     """
+    logger = DetailedLogger() if ENABLE_REQUEST_LOGGING else None
     try:
         request_data = await request.json()
+        if logger:
+            logger.log_request(headers=request.headers, body=request_data)
         log_request_to_console(
             url=str(request.url),
             headers=dict(request.headers),
         if is_streaming:
             response_generator = client.acompletion(request=request, **request_data)
             return StreamingResponse(
+                streaming_response_wrapper(request, request_data, response_generator, logger),
                 media_type="text/event-stream"
             )
         else:
             response = await client.acompletion(request=request, **request_data)
+            if logger:
+                # Assuming response has status_code and headers attributes
+                # This might need adjustment based on the actual response object
+                response_headers = response.headers if hasattr(response, 'headers') else None
+                status_code = response.status_code if hasattr(response, 'status_code') else 200
+                logger.log_final_response(
+                    status_code=status_code,
+                    headers=response_headers,
+                    body=response.model_dump()
                 )
             return response
                 request_data = await request.json()
             except json.JSONDecodeError:
                 request_data = {"error": "Could not parse request body"}
+            if logger:
+                logger.log_final_response(status_code=500, headers=None, body={"error": str(e)})
         raise HTTPException(status_code=500, detail=str(e))
 @app.post("/v1/embeddings")

src/proxy_app/request_logger.py CHANGED Viewed

@@ -38,39 +38,3 @@ def log_request_to_console(url: str, headers: dict, client_info: tuple, request_
     log_message = f"{time_str} - {client_info[0]}:{client_info[1]} - provider: {provider}, model: {model_name} - {endpoint_url}"
     logging.info(log_message)
-def log_request_response(
-    request_data: dict,
-    response_data: dict,
-    is_streaming: bool,
-    log_type: Literal["completion", "embedding"]
-):
-    """
-    Logs the request and response data to a file in the appropriate log directory.
-    """
-    try:
-        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
-        unique_id = uuid.uuid4()
-        if log_type == "completion":
-            target_dir = COMPLETIONS_LOGS_DIR
-        elif log_type == "embedding":
-            target_dir = EMBEDDINGS_LOGS_DIR
-        else:
-            # Fallback to the main logs directory if log_type is invalid
-            target_dir = LOGS_DIR
-        filename = target_dir / f"{timestamp}_{unique_id}.json"
-        log_content = {
-            "request": request_data,
-            "response": response_data,
-            "is_streaming": is_streaming
-        }
-        with open(filename, "w", encoding="utf-8") as f:
-            json.dump(log_content, f, indent=4, ensure_ascii=False)
-    except Exception as e:
-        # In case of logging failure, we don't want to crash the main application
-        # Use the root logger to log the error to the file.
-        logging.error(f"Error logging request/response to file: {e}")


38	log_message = f"{time_str} - {client_info[0]}:{client_info[1]} - provider: {provider}, model: {model_name} - {endpoint_url}"
39	logging.info(log_message)
40