Mirrowel commited on
Commit
fc62d82
·
1 Parent(s): 1b7b9f5

feat(proxy): ✨ implement detailed per-request transaction logging

Browse files

The new `DetailedLogger` captures comprehensive information for each API transaction, replacing the previous `log_request_response` function.

- Creates a unique, timestamped directory for every incoming request.
- Logs initial request headers and body to a `request.json` file.
- Appends individual chunks for streaming responses to a `streaming_chunks.jsonl` file.
- Records the complete final API response (status, headers, body, duration) to a `final_response.json` file.
- Generates `metadata.json` with a summary including model, token usage, finish reason, and extracted reasoning.

.gitignore CHANGED
@@ -124,3 +124,4 @@ test_proxy.py
124
  start_proxy.bat
125
  key_usage.json
126
  staged_changes.txt
 
 
124
  start_proxy.bat
125
  key_usage.json
126
  staged_changes.txt
127
+ logs/
DOCUMENTATION.md CHANGED
@@ -165,9 +165,28 @@ For streaming requests, the `chat_completions` endpoint returns a `StreamingResp
165
  1. It passes the chunks from the `RotatingClient`'s stream directly to the user.
166
  2. It aggregates the full response in the background so that it can be logged completely once the stream is finished.
167
 
168
- ### 3.2. `request_logger.py`
169
 
170
- This module provides the `log_request_response` function, which writes the request and response data to a timestamped JSON file in the `logs/` directory. It handles creating separate directories for `completions` and `embeddings`.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
171
 
172
  ### 3.3. `build.py`
173
 
 
165
  1. It passes the chunks from the `RotatingClient`'s stream directly to the user.
166
  2. It aggregates the full response in the background so that it can be logged completely once the stream is finished.
167
 
168
+ ### 3.2. `detailed_logger.py` - Comprehensive Transaction Logging
169
 
170
+ To facilitate robust debugging and performance analysis, the proxy includes a powerful detailed logging system, enabled by the `--enable-request-logging` command-line flag. This system is managed by the `DetailedLogger` class in `detailed_logger.py`.
171
+
172
+ Unlike simple logging, this system creates a **unique directory for every single transaction**, ensuring that all related data is isolated and easy to analyze.
173
+
174
+ #### Log Directory Structure
175
+
176
+ When logging is enabled, each request will generate a new directory inside `logs/detailed_logs/` with a name like `YYYYMMDD_HHMMSS_unique-uuid`. Inside this directory, you will find a complete record of the transaction:
177
+
178
+ - **`request.json`**: Contains the full incoming request, including HTTP headers and the JSON body.
179
+ - **`streaming_chunks.jsonl`**: For streaming requests, this file contains a timestamped log of every individual data chunk received from the provider. This is invaluable for debugging malformed streams or partial responses.
180
+ - **`final_response.json`**: Contains the complete final response from the provider, including the status code, headers, and full JSON body. For streaming requests, this body is the fully reassembled message.
181
+ - **`metadata.json`**: A summary file for quick analysis, containing:
182
+ - `request_id`: The unique identifier for the transaction.
183
+ - `duration_ms`: The total time taken for the request to complete.
184
+ - `status_code`: The final HTTP status code returned by the provider.
185
+ - `model`: The model used for the request.
186
+ - `usage`: Token usage statistics (`prompt`, `completion`, `total`).
187
+ - `finish_reason`: The reason the model stopped generating tokens.
188
+ - `reasoning_found`: A boolean indicating if a `reasoning` field was detected in the response.
189
+ - `reasoning_content`: The extracted content of the `reasoning` field, if found.
190
 
191
  ### 3.3. `build.py`
192
 
README.md CHANGED
@@ -30,7 +30,7 @@ This project provides a powerful solution for developers building complex applic
30
  - **Intelligent Key Management**: Optimizes request distribution across your pool of keys by selecting the best available one for each call.
31
  - **Escalating Per-Model Cooldowns**: If a key fails for a specific model, it's placed on a temporary, escalating cooldown for that model, allowing it to be used with others.
32
  - **Automatic Daily Resets**: Cooldowns and usage statistics are automatically reset daily, making the system self-maintaining.
33
- - **Request Logging**: Optional logging of full request and response payloads for easy debugging.
34
  - **Provider Agnostic**: Compatible with any provider supported by `litellm`.
35
  - **OpenAI-Compatible Proxy**: Offers a familiar API interface with additional endpoints for model and provider discovery.
36
 
@@ -238,7 +238,7 @@ The proxy server can be configured at runtime using the following command-line a
238
 
239
  - `--host`: The IP address to bind the server to. Defaults to `0.0.0.0` (accessible from your local network).
240
  - `--port`: The port to run the server on. Defaults to `8000`.
241
- - `--enable-request-logging`: A flag to enable logging of full request and response payloads to the `logs/` directory. This is useful for debugging.
242
 
243
  **Example:**
244
  ```bash
@@ -255,8 +255,8 @@ For convenience on Windows, you can use the provided `.bat` scripts in the root
255
  ### Troubleshooting
256
 
257
  - **`401 Unauthorized`**: Ensure your `PROXY_API_KEY` is set correctly in the `.env` file and included in the `Authorization: Bearer <key>` header of your request.
258
- - **`500 Internal Server Error`**: Check the console logs of the `uvicorn` server for detailed error messages. This could indicate an issue with one of your provider API keys (e.g., it's invalid or has been revoked) or a problem with the provider's service.
259
- - **All keys on cooldown**: If you see a message that all keys are on cooldown, it means all your keys for a specific provider have recently failed. Check the `logs/` directory (if enabled) or the `key_usage.json` file for details on why the failures occurred.
260
 
261
  ---
262
 
 
30
  - **Intelligent Key Management**: Optimizes request distribution across your pool of keys by selecting the best available one for each call.
31
  - **Escalating Per-Model Cooldowns**: If a key fails for a specific model, it's placed on a temporary, escalating cooldown for that model, allowing it to be used with others.
32
  - **Automatic Daily Resets**: Cooldowns and usage statistics are automatically reset daily, making the system self-maintaining.
33
+ - **Detailed Request Logging**: Enable comprehensive logging for debugging. Each request gets its own directory with full request/response details, streaming chunks, and performance metadata.
34
  - **Provider Agnostic**: Compatible with any provider supported by `litellm`.
35
  - **OpenAI-Compatible Proxy**: Offers a familiar API interface with additional endpoints for model and provider discovery.
36
 
 
238
 
239
  - `--host`: The IP address to bind the server to. Defaults to `0.0.0.0` (accessible from your local network).
240
  - `--port`: The port to run the server on. Defaults to `8000`.
241
+ - `--enable-request-logging`: A flag to enable detailed, per-request logging. When active, the proxy creates a unique directory for each transaction in the `logs/detailed_logs/` folder, containing the full request, response, streaming chunks, and performance metadata. This is highly recommended for debugging.
242
 
243
  **Example:**
244
  ```bash
 
255
  ### Troubleshooting
256
 
257
  - **`401 Unauthorized`**: Ensure your `PROXY_API_KEY` is set correctly in the `.env` file and included in the `Authorization: Bearer <key>` header of your request.
258
+ - **`500 Internal Server Error`**: Check the console logs of the `uvicorn` server for detailed error messages. This could indicate an issue with one of your provider API keys (e.g., it's invalid or has been revoked) or a problem with the provider's service. If you have logging enabled (`--enable-request-logging`), inspect the `final_response.json` and `metadata.json` files in the corresponding log directory under `logs/detailed_logs/` for the specific error returned by the upstream provider.
259
+ - **All keys on cooldown**: If you see a message that all keys are on cooldown, it means all your keys for a specific provider have recently failed. If you have logging enabled (`--enable-request-logging`), check the `logs/detailed_logs/` directory to find the logs for the failed requests and inspect the `final_response.json` to see the underlying error from the provider.
260
 
261
  ---
262
 
src/proxy_app/detailed_logger.py ADDED
@@ -0,0 +1,121 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ import time
3
+ import uuid
4
+ from datetime import datetime
5
+ from pathlib import Path
6
+ from typing import Any, Dict, Optional, List
7
+ import logging
8
+
9
+ LOGS_DIR = Path(__file__).resolve().parent.parent.parent / "logs"
10
+ DETAILED_LOGS_DIR = LOGS_DIR / "detailed_logs"
11
+
12
+ class DetailedLogger:
13
+ """
14
+ Logs comprehensive details of each API transaction to a unique, timestamped directory.
15
+ """
16
+ def __init__(self):
17
+ """
18
+ Initializes the logger for a single request, creating a unique directory to store all related log files.
19
+ """
20
+ self.start_time = time.time()
21
+ self.request_id = str(uuid.uuid4())
22
+ timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
23
+ self.log_dir = DETAILED_LOGS_DIR / f"{timestamp}_{self.request_id}"
24
+ self.log_dir.mkdir(parents=True, exist_ok=True)
25
+ self.streaming = False
26
+
27
+ def _write_json(self, filename: str, data: Dict[str, Any]):
28
+ """Helper to write data to a JSON file in the log directory."""
29
+ try:
30
+ with open(self.log_dir / filename, "w", encoding="utf-8") as f:
31
+ json.dump(data, f, indent=4, ensure_ascii=False)
32
+ except Exception as e:
33
+ logging.error(f"[{self.request_id}] Failed to write to {filename}: {e}")
34
+
35
+ def log_request(self, headers: Dict[str, Any], body: Dict[str, Any]):
36
+ """Logs the initial request details."""
37
+ self.streaming = body.get("stream", False)
38
+ request_data = {
39
+ "request_id": self.request_id,
40
+ "timestamp_utc": datetime.utcnow().isoformat(),
41
+ "headers": dict(headers),
42
+ "body": body
43
+ }
44
+ self._write_json("request.json", request_data)
45
+
46
+ def log_stream_chunk(self, chunk: Dict[str, Any]):
47
+ """Logs an individual chunk from a streaming response to a JSON Lines file."""
48
+ try:
49
+ log_entry = {
50
+ "timestamp_utc": datetime.utcnow().isoformat(),
51
+ "chunk": chunk
52
+ }
53
+ with open(self.log_dir / "streaming_chunks.jsonl", "a", encoding="utf-8") as f:
54
+ f.write(json.dumps(log_entry, ensure_ascii=False) + "\n")
55
+ except Exception as e:
56
+ logging.error(f"[{self.request_id}] Failed to write stream chunk: {e}")
57
+
58
+ def log_final_response(self, status_code: int, headers: Optional[Dict[str, Any]], body: Dict[str, Any]):
59
+ """Logs the complete final response, either from a non-streaming call or after reassembling a stream."""
60
+ end_time = time.time()
61
+ duration_ms = (end_time - self.start_time) * 1000
62
+
63
+ response_data = {
64
+ "request_id": self.request_id,
65
+ "timestamp_utc": datetime.utcnow().isoformat(),
66
+ "status_code": status_code,
67
+ "duration_ms": round(duration_ms),
68
+ "headers": dict(headers) if headers else None,
69
+ "body": body
70
+ }
71
+ self._write_json("final_response.json", response_data)
72
+ self._log_metadata(response_data)
73
+
74
+ def _extract_reasoning(self, response_body: Dict[str, Any]) -> Optional[str]:
75
+ """Recursively searches for and extracts 'reasoning' fields from the response body."""
76
+ if not isinstance(response_body, dict):
77
+ return None
78
+
79
+ if "reasoning" in response_body:
80
+ return response_body["reasoning"]
81
+
82
+ if "choices" in response_body and response_body["choices"]:
83
+ message = response_body["choices"][0].get("message", {})
84
+ if "reasoning" in message:
85
+ return message["reasoning"]
86
+ if "reasoning_content" in message:
87
+ return message["reasoning_content"]
88
+
89
+ return None
90
+
91
+ def _log_metadata(self, response_data: Dict[str, Any]):
92
+ """Logs a summary of the transaction for quick analysis."""
93
+ usage = response_data.get("body", {}).get("usage", {})
94
+ model = response_data.get("body", {}).get("model", "N/A")
95
+ finish_reason = "N/A"
96
+ if "choices" in response_data.get("body", {}) and response_data["body"]["choices"]:
97
+ finish_reason = response_data["body"]["choices"][0].get("finish_reason", "N/A")
98
+
99
+ metadata = {
100
+ "request_id": self.request_id,
101
+ "timestamp_utc": response_data["timestamp_utc"],
102
+ "duration_ms": response_data["duration_ms"],
103
+ "status_code": response_data["status_code"],
104
+ "model": model,
105
+ "streaming": self.streaming,
106
+ "usage": {
107
+ "prompt_tokens": usage.get("prompt_tokens"),
108
+ "completion_tokens": usage.get("completion_tokens"),
109
+ "total_tokens": usage.get("total_tokens"),
110
+ },
111
+ "finish_reason": finish_reason,
112
+ "reasoning_found": False,
113
+ "reasoning_content": None
114
+ }
115
+
116
+ reasoning = self._extract_reasoning(response_data.get("body", {}))
117
+ if reasoning:
118
+ metadata["reasoning_found"] = True
119
+ metadata["reasoning_content"] = reasoning
120
+
121
+ self._write_json("metadata.json", metadata)
src/proxy_app/main.py CHANGED
@@ -52,8 +52,9 @@ args, _ = parser.parse_known_args()
52
  sys.path.append(str(Path(__file__).resolve().parent.parent))
53
 
54
  from rotator_library import RotatingClient, PROVIDER_PLUGINS
55
- from proxy_app.request_logger import log_request_response, log_request_to_console
56
  from proxy_app.batch_manager import EmbeddingBatcher
 
57
 
58
  # --- Logging Configuration ---
59
  LOG_DIR = Path(__file__).resolve().parent.parent / "logs"
@@ -206,7 +207,8 @@ async def verify_api_key(auth: str = Depends(api_key_header)):
206
  async def streaming_response_wrapper(
207
  request: Request,
208
  request_data: dict,
209
- response_stream: AsyncGenerator[str, None]
 
210
  ) -> AsyncGenerator[str, None]:
211
  """
212
  Wraps a streaming response to log the full response after completion
@@ -227,8 +229,10 @@ async def streaming_response_wrapper(
227
  try:
228
  chunk_data = json.loads(content)
229
  response_chunks.append(chunk_data)
 
 
230
  except json.JSONDecodeError:
231
- pass # Ignore non-JSON chunks
232
  except Exception as e:
233
  logging.error(f"An error occurred during the response stream: {e}")
234
  # Yield a final error message to the client to ensure they are not left hanging.
@@ -242,13 +246,8 @@ async def streaming_response_wrapper(
242
  yield f"data: {json.dumps(error_payload)}\n\n"
243
  yield "data: [DONE]\n\n"
244
  # Also log this as a failed request
245
- if ENABLE_REQUEST_LOGGING:
246
- log_request_response(
247
- request_data=request_data,
248
- response_data={"error": str(e)},
249
- is_streaming=True,
250
- log_type="completion"
251
- )
252
  return # Stop further processing
253
  finally:
254
  if response_chunks:
@@ -341,12 +340,11 @@ async def streaming_response_wrapper(
341
  "usage": usage_data
342
  }
343
 
344
- if ENABLE_REQUEST_LOGGING:
345
- log_request_response(
346
- request_data=request_data,
347
- response_data=full_response,
348
- is_streaming=True,
349
- log_type="completion"
350
  )
351
 
352
  @app.post("/v1/chat/completions")
@@ -359,8 +357,12 @@ async def chat_completions(
359
  OpenAI-compatible endpoint powered by the RotatingClient.
360
  Handles both streaming and non-streaming responses and logs them.
361
  """
 
362
  try:
363
  request_data = await request.json()
 
 
 
364
  log_request_to_console(
365
  url=str(request.url),
366
  headers=dict(request.headers),
@@ -372,17 +374,20 @@ async def chat_completions(
372
  if is_streaming:
373
  response_generator = client.acompletion(request=request, **request_data)
374
  return StreamingResponse(
375
- streaming_response_wrapper(request, request_data, response_generator),
376
  media_type="text/event-stream"
377
  )
378
  else:
379
  response = await client.acompletion(request=request, **request_data)
380
- if ENABLE_REQUEST_LOGGING:
381
- log_request_response(
382
- request_data=request_data,
383
- response_data=response.model_dump(),
384
- is_streaming=False,
385
- log_type="completion"
 
 
 
386
  )
387
  return response
388
 
@@ -406,12 +411,8 @@ async def chat_completions(
406
  request_data = await request.json()
407
  except json.JSONDecodeError:
408
  request_data = {"error": "Could not parse request body"}
409
- log_request_response(
410
- request_data=request_data,
411
- response_data={"error": str(e)},
412
- is_streaming=request_data.get("stream", False),
413
- log_type="completion"
414
- )
415
  raise HTTPException(status_code=500, detail=str(e))
416
 
417
  @app.post("/v1/embeddings")
 
52
  sys.path.append(str(Path(__file__).resolve().parent.parent))
53
 
54
  from rotator_library import RotatingClient, PROVIDER_PLUGINS
55
+ from proxy_app.request_logger import log_request_to_console
56
  from proxy_app.batch_manager import EmbeddingBatcher
57
+ from proxy_app.detailed_logger import DetailedLogger
58
 
59
  # --- Logging Configuration ---
60
  LOG_DIR = Path(__file__).resolve().parent.parent / "logs"
 
207
  async def streaming_response_wrapper(
208
  request: Request,
209
  request_data: dict,
210
+ response_stream: AsyncGenerator[str, None],
211
+ logger: Optional[DetailedLogger] = None
212
  ) -> AsyncGenerator[str, None]:
213
  """
214
  Wraps a streaming response to log the full response after completion
 
229
  try:
230
  chunk_data = json.loads(content)
231
  response_chunks.append(chunk_data)
232
+ if logger:
233
+ logger.log_stream_chunk(chunk_data)
234
  except json.JSONDecodeError:
235
+ pass
236
  except Exception as e:
237
  logging.error(f"An error occurred during the response stream: {e}")
238
  # Yield a final error message to the client to ensure they are not left hanging.
 
246
  yield f"data: {json.dumps(error_payload)}\n\n"
247
  yield "data: [DONE]\n\n"
248
  # Also log this as a failed request
249
+ if logger:
250
+ logger.log_final_response(status_code=500, headers=None, body={"error": str(e)})
 
 
 
 
 
251
  return # Stop further processing
252
  finally:
253
  if response_chunks:
 
340
  "usage": usage_data
341
  }
342
 
343
+ if logger:
344
+ logger.log_final_response(
345
+ status_code=200,
346
+ headers=None, # Headers are not available at this stage
347
+ body=full_response
 
348
  )
349
 
350
  @app.post("/v1/chat/completions")
 
357
  OpenAI-compatible endpoint powered by the RotatingClient.
358
  Handles both streaming and non-streaming responses and logs them.
359
  """
360
+ logger = DetailedLogger() if ENABLE_REQUEST_LOGGING else None
361
  try:
362
  request_data = await request.json()
363
+ if logger:
364
+ logger.log_request(headers=request.headers, body=request_data)
365
+
366
  log_request_to_console(
367
  url=str(request.url),
368
  headers=dict(request.headers),
 
374
  if is_streaming:
375
  response_generator = client.acompletion(request=request, **request_data)
376
  return StreamingResponse(
377
+ streaming_response_wrapper(request, request_data, response_generator, logger),
378
  media_type="text/event-stream"
379
  )
380
  else:
381
  response = await client.acompletion(request=request, **request_data)
382
+ if logger:
383
+ # Assuming response has status_code and headers attributes
384
+ # This might need adjustment based on the actual response object
385
+ response_headers = response.headers if hasattr(response, 'headers') else None
386
+ status_code = response.status_code if hasattr(response, 'status_code') else 200
387
+ logger.log_final_response(
388
+ status_code=status_code,
389
+ headers=response_headers,
390
+ body=response.model_dump()
391
  )
392
  return response
393
 
 
411
  request_data = await request.json()
412
  except json.JSONDecodeError:
413
  request_data = {"error": "Could not parse request body"}
414
+ if logger:
415
+ logger.log_final_response(status_code=500, headers=None, body={"error": str(e)})
 
 
 
 
416
  raise HTTPException(status_code=500, detail=str(e))
417
 
418
  @app.post("/v1/embeddings")
src/proxy_app/request_logger.py CHANGED
@@ -38,39 +38,3 @@ def log_request_to_console(url: str, headers: dict, client_info: tuple, request_
38
  log_message = f"{time_str} - {client_info[0]}:{client_info[1]} - provider: {provider}, model: {model_name} - {endpoint_url}"
39
  logging.info(log_message)
40
 
41
- def log_request_response(
42
- request_data: dict,
43
- response_data: dict,
44
- is_streaming: bool,
45
- log_type: Literal["completion", "embedding"]
46
- ):
47
- """
48
- Logs the request and response data to a file in the appropriate log directory.
49
- """
50
- try:
51
- timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
52
- unique_id = uuid.uuid4()
53
-
54
- if log_type == "completion":
55
- target_dir = COMPLETIONS_LOGS_DIR
56
- elif log_type == "embedding":
57
- target_dir = EMBEDDINGS_LOGS_DIR
58
- else:
59
- # Fallback to the main logs directory if log_type is invalid
60
- target_dir = LOGS_DIR
61
-
62
- filename = target_dir / f"{timestamp}_{unique_id}.json"
63
-
64
- log_content = {
65
- "request": request_data,
66
- "response": response_data,
67
- "is_streaming": is_streaming
68
- }
69
-
70
- with open(filename, "w", encoding="utf-8") as f:
71
- json.dump(log_content, f, indent=4, ensure_ascii=False)
72
-
73
- except Exception as e:
74
- # In case of logging failure, we don't want to crash the main application
75
- # Use the root logger to log the error to the file.
76
- logging.error(f"Error logging request/response to file: {e}")
 
38
  log_message = f"{time_str} - {client_info[0]}:{client_info[1]} - provider: {provider}, model: {model_name} - {endpoint_url}"
39
  logging.info(log_message)
40