L瓢u Quang V农 commited on
Commit
7db4283
unverified
1 Parent(s): f8272eb

:sparkles: Enable real-time streaming responses and completely solve the issue with reusable sessions. (#95)

Browse files

* Remove the unused auto-refresh functionality and related imports.

They are no longer needed since the underlying library issue has been resolved.

* Enhance error handling in client initialization and message sending

* Refactor link handling to extract file paths and simplify Google search links

* Fix regex pattern for Google search link matching

* Fix regex patterns for Markdown escaping, code fence and Google search link matching

* Increase timeout value in configuration files from 60 to 120 seconds to better handle heavy tasks

* Fix Image generation

* Refactor tool handling to support standard and image generation tools separately

* Fix: use "ascii" decoding for base64-encoded image data consistency

* Fix: replace `running` with `_running` for internal client status checks

* Refactor: replace direct `_running` access with `running()` method in client status checks

* Extend models with new fields for annotations, reasoning, audio, log probabilities, and token details; adjust response handling accordingly.

* Extend models with new fields (annotations, error), add `normalize_output_text` validator, rename `created` to `created_at`, and update response handling accordingly.

* Extend response models to support tool choices, image output, and improved streaming of response items. Refactor image generation handling for consistency and add compatibility with output content.

* Set default `text` value to an empty string for `ResponseOutputContent` and ensure consistent initialization in image output handling.

* feat: Add /images endpoint with dedicated router and improved image management

Add dedicated router for /images endpoint and refactor image handling logic for better modularity. Enhance temporary image management with secure naming, token verification, and cleanup functionality.

* feat: Add token-based verification for image access

* Refactor: rename image store directory to `ai_generated_images` for clarity

* fix: Update create_response to use FastAPI Request object for base_url and refactor variable handling

* fix: Correct attribute access in request_data handling within `chat.py` for tools, tool_choice, and streaming settings

* fix: Save generated images to persistent storage

* fix: Remove unused `output_image` type from `ResponseOutputContent` and update response handling for consistency

* fix: Update image URL generation in chat response to use Markdown format for compatibility

* fix: Enhance error handling for full-size image saving and add fallback to default size

* fix: Use filename as image ID to ensure consistency in generated image handling

* fix: Enhance tempfile saving by adding custom headers, content-type handling, and improved extension determination

* feat: Add support for custom Gemini models and model loading strategies

- Introduced `model_strategy` configuration for "append" (default + custom models) or "overwrite" (custom models only).
- Enhanced `/v1/models` endpoint to return models based on the configured strategy.
- Improved model loading with environment variable overrides and validation.
- Refactored model handling logic for improved modularity and error handling.

* feat: Improve Gemini model environment variable parsing and nested field support

- Enhanced `extract_gemini_models_env` to handle nested fields within environment variables.
- Updated type hints for more flexibility in model overrides.
- Improved `_merge_models_with_env` to better support field-level updates and appending new models.

* refactor: Consolidate utility functions and clean up unused code

- Moved utility functions like `strip_code_fence`, `extract_tool_calls`, and `iter_stream_segments` to a centralized helper module.
- Removed unused and redundant private methods from `chat.py`, including `_strip_code_fence`, `_strip_tagged_blocks`, and `_strip_system_hints`.
- Updated imports and references across modules for consistency.
- Simplified tool call and streaming logic by replacing inline implementations with shared helper functions.

* fix: Handle None input in `estimate_tokens` and return 0 for empty text

* refactor: Simplify model configuration and add JSON parsing validators

- Replaced unused model placeholder in `config.yaml` with an empty list.
- Added JSON parsing validators for `model_header` and `models` to enhance flexibility and error handling.
- Improved validation to filter out incomplete model configurations.

* refactor: Simplify Gemini model environment variable parsing with JSON support

- Replaced prefix-based parsing with a root key approach.
- Added JSON parsing to handle list-based model configurations.
- Improved handling of errors and cleanup of environment variables.

* fix: Enhance Gemini model environment variable parsing with fallback to Python literals

- Added `ast.literal_eval` as a fallback for parsing environment variables when JSON decoding fails.
- Improved error handling and logging for invalid configurations.
- Ensured proper cleanup of environment variables post-parsing.

* fix: Improve regex patterns in helper module

- Adjusted `TOOL_CALL_RE` regex pattern for better accuracy.

* docs: Update README files to include custom model configuration and environment variable setup

* fix: Remove unused headers from HTTP client in helper module

* fix: Update README and README.zh to clarify model configuration via environment variables; enhance error logging in config validation

* Update README and README.zh to clarify model configuration via JSON string or list structure for enhanced flexibility in automated environments

* Refactor: compress JSON content to save tokens and streamline sending multiple chunks

* Refactor: Modify the LMDB store to fix issues where no conversation is found in either the raw or cleaned history.

* Refactor: Modify the LMDB store to fix issues where no conversation is found.

* Refactor: Update all functions to use orjson for better performance

* Update project dependencies

* Fix IDE warnings

* Incorrect IDE warnings

* Refactor: Modify the LMDB store to fix issues where no conversation is found.

* Refactor: Centralized the mapping of the 'developer' role to 'system' for better Gemini compatibility.

* Refactor: Modify the LMDB store to fix issues where no conversation is found.

* Refactor: Modify the LMDB store to fix issues where no conversation is found.

* Refactor: Modify the LMDB store to fix issues where no conversation is found.

* Refactor: Avoid reusing an existing chat session if its idle time exceeds METADATA_TTL_MINUTES.

* Refactor: Update the LMDB store to resolve issues preventing conversation from being saved

* Refactor: Update the _prepare_messages_for_model helper to omit the system instruction when reusing a session to save tokens.

* Refactor: Modify the logic to convert a large prompt into a temporary text file attachment

- When multiple chunks are sent simultaneously, Google will immediately invalidate the access token and reject the request
- When a prompt contains a structured format like JSON, splitting it can break the format and may cause the model to misunderstand the context
- Another minor tweak as Copilot suggested

* Enable streaming responses and fully resolve the problem with reusable sessions.

- Ensure that PR https://github.com/HanaokaYuzu/Gemini-API/pull/220 is merged before proceeding with this PR.

* Enable real-time streaming responses and completely solve the issue with reusable sessions.

- Ensure that PR https://github.com/HanaokaYuzu/Gemini-API/pull/220 is merged before proceeding with this PR.
- Introducing a new feature for real-time streaming responses.
- Fully resolve the problem with reusable sessions.
- Break down similar flow logic into helper functions.
- All endpoints now support inline Markdown images.
- Switch large prompts to use BytesIO to avoid reading and writing to disk.

* Enable real-time streaming responses and completely solve the issue with reusable sessions.

- Ensure that PR https://github.com/HanaokaYuzu/Gemini-API/pull/220 is merged before proceeding with this PR.
- Introducing a new feature for real-time streaming responses.
- Fully resolve the problem with reusable sessions.
- Break down similar flow logic into helper functions.
- All endpoints now support inline Markdown images.
- Switch large prompts to use BytesIO to avoid reading and writing to disk.
- Remove duplicate images when saving and responding.

* Enable real-time streaming responses and completely solve the issue with reusable sessions.

- Ensure that PR https://github.com/HanaokaYuzu/Gemini-API/pull/220 is merged before proceeding with this PR.
- Introducing a new feature for real-time streaming responses.
- Fully resolve the problem with reusable sessions.
- Break down similar flow logic into helper functions.
- All endpoints now support inline Markdown images.
- Switch large prompts to use BytesIO to avoid reading and writing to disk.
- Remove duplicate images when saving and responding.

* Enable real-time streaming responses and completely solve the issue with reusable sessions.

- Ensure that PR https://github.com/HanaokaYuzu/Gemini-API/pull/220 is merged before proceeding with this PR.
- Introducing a new feature for real-time streaming responses.
- Fully resolve the problem with reusable sessions.
- Break down similar flow logic into helper functions.
- All endpoints now support inline Markdown images.
- Switch large prompts to use BytesIO to avoid reading and writing to disk.
- Remove duplicate images when saving and responding.

* Enable real-time streaming responses and completely solve the issue with reusable sessions.

- Ensure that PR https://github.com/HanaokaYuzu/Gemini-API/pull/220 is merged before proceeding with this PR.
- Introducing a new feature for real-time streaming responses.
- Fully resolve the problem with reusable sessions.
- Break down similar flow logic into helper functions.
- All endpoints now support inline Markdown images.
- Switch large prompts to use BytesIO to avoid reading and writing to disk.
- Remove duplicate images when saving and responding.

* build: update

app/main.py CHANGED
@@ -15,7 +15,7 @@ from .server.middleware import (
15
  )
16
  from .services import GeminiClientPool, LMDBConversationStore
17
 
18
- RETENTION_CLEANUP_INTERVAL_SECONDS = 6 * 60 * 60 # 6 hours
19
 
20
 
21
  async def _run_retention_cleanup(stop_event: asyncio.Event) -> None:
 
15
  )
16
  from .services import GeminiClientPool, LMDBConversationStore
17
 
18
+ RETENTION_CLEANUP_INTERVAL_SECONDS = 6 * 60 * 60 # Check every 6 hours
19
 
20
 
21
  async def _run_retention_cleanup(stop_event: asyncio.Event) -> None:
app/models/models.py CHANGED
@@ -7,7 +7,7 @@ from pydantic import BaseModel, Field, model_validator
7
 
8
 
9
  class ContentItem(BaseModel):
10
- """Content item model"""
11
 
12
  type: Literal["text", "image_url", "file", "input_audio"]
13
  text: Optional[str] = None
@@ -159,7 +159,7 @@ class ConversationInStore(BaseModel):
159
  created_at: Optional[datetime] = Field(default=None)
160
  updated_at: Optional[datetime] = Field(default=None)
161
 
162
- # NOTE: Gemini Web API do not support changing models once a conversation is created.
163
  model: str = Field(..., description="Model used for the conversation")
164
  client_id: str = Field(..., description="Identifier of the Gemini client")
165
  metadata: list[str | None] = Field(
 
7
 
8
 
9
  class ContentItem(BaseModel):
10
+ """Individual content item (text, image, or file) within a message."""
11
 
12
  type: Literal["text", "image_url", "file", "input_audio"]
13
  text: Optional[str] = None
 
159
  created_at: Optional[datetime] = Field(default=None)
160
  updated_at: Optional[datetime] = Field(default=None)
161
 
162
+ # Gemini Web API does not support changing models once a conversation is created.
163
  model: str = Field(..., description="Model used for the conversation")
164
  client_id: str = Field(..., description="Identifier of the Gemini client")
165
  metadata: list[str | None] = Field(
app/server/chat.py CHANGED
@@ -1,18 +1,19 @@
1
  import base64
2
- import re
3
- import tempfile
 
4
  import uuid
5
  from dataclasses import dataclass
6
  from datetime import datetime, timezone
7
  from pathlib import Path
8
- from typing import Any
9
 
10
  import orjson
11
  from fastapi import APIRouter, Depends, HTTPException, Request, status
12
  from fastapi.responses import StreamingResponse
 
13
  from gemini_webapi.client import ChatSession
14
  from gemini_webapi.constants import Model
15
- from gemini_webapi.exceptions import APIError
16
  from gemini_webapi.types.image import GeneratedImage, Image
17
  from loguru import logger
18
 
@@ -42,21 +43,18 @@ from ..utils import g_config
42
  from ..utils.helper import (
43
  CODE_BLOCK_HINT,
44
  CODE_HINT_STRIPPED,
 
45
  XML_HINT_STRIPPED,
46
  XML_WRAP_HINT,
47
  estimate_tokens,
48
  extract_image_dimensions,
49
  extract_tool_calls,
50
- iter_stream_segments,
51
- remove_tool_call_blocks,
52
  strip_code_fence,
53
  text_from_message,
54
  )
55
  from .middleware import get_image_store_dir, get_image_token, get_temp_dir, verify_api_key
56
 
57
- # Maximum characters Gemini Web can accept in a single request (configurable)
58
  MAX_CHARS_PER_REQUEST = int(g_config.gemini.max_chars_per_request * 0.9)
59
- CONTINUATION_HINT = "\n(More messages to come, please reply with just 'ok.')"
60
  METADATA_TTL_MINUTES = 15
61
 
62
  router = APIRouter()
@@ -72,6 +70,212 @@ class StructuredOutputRequirement:
72
  raw_format: dict[str, Any]
73
 
74
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
75
  def _build_structured_requirement(
76
  response_format: dict[str, Any] | None,
77
  ) -> StructuredOutputRequirement | None:
@@ -80,17 +284,23 @@ def _build_structured_requirement(
80
  return None
81
 
82
  if response_format.get("type") != "json_schema":
83
- logger.warning(f"Unsupported response_format type requested: {response_format}")
 
 
84
  return None
85
 
86
  json_schema = response_format.get("json_schema")
87
  if not isinstance(json_schema, dict):
88
- logger.warning(f"Invalid json_schema payload in response_format: {response_format}")
 
 
89
  return None
90
 
91
  schema = json_schema.get("schema")
92
  if not isinstance(schema, dict):
93
- logger.warning(f"Missing `schema` object in response_format payload: {response_format}")
 
 
94
  return None
95
 
96
  schema_name = json_schema.get("name") or "response"
@@ -136,7 +346,9 @@ def _build_tool_prompt(
136
  description = function.description or "No description provided."
137
  lines.append(f"Tool `{function.name}`: {description}")
138
  if function.parameters:
139
- schema_text = orjson.dumps(function.parameters).decode("utf-8")
 
 
140
  lines.append("Arguments JSON schema:")
141
  lines.append(schema_text)
142
  else:
@@ -155,7 +367,6 @@ def _build_tool_prompt(
155
  lines.append(
156
  f"You are required to call the tool named `{target}`. Do not call any other tool."
157
  )
158
- # `auto` or None fall back to default instructions.
159
 
160
  lines.append(
161
  "When you decide to call a tool you MUST respond with nothing except a single fenced block exactly like the template below."
@@ -221,7 +432,7 @@ def _append_xml_hint_to_last_user_message(messages: list[Message]) -> None:
221
 
222
  if isinstance(msg.content, str):
223
  if XML_HINT_STRIPPED not in msg.content:
224
- msg.content = f"{msg.content}{XML_WRAP_HINT}"
225
  return
226
 
227
  if isinstance(msg.content, list):
@@ -231,15 +442,13 @@ def _append_xml_hint_to_last_user_message(messages: list[Message]) -> None:
231
  text_value = part.text or ""
232
  if XML_HINT_STRIPPED in text_value:
233
  return
234
- part.text = f"{text_value}{XML_WRAP_HINT}"
235
  return
236
 
237
  messages_text = XML_WRAP_HINT.strip()
238
  msg.content.append(ContentItem(type="text", text=messages_text))
239
  return
240
 
241
- # No user message to annotate; nothing to do.
242
-
243
 
244
  def _conversation_has_code_hint(messages: list[Message]) -> bool:
245
  """Return True if any system message already includes the code block hint."""
@@ -272,6 +481,17 @@ def _prepare_messages_for_model(
272
  """Return a copy of messages enriched with tool instructions when needed."""
273
  prepared = [msg.model_copy(deep=True) for msg in source_messages]
274
 
 
 
 
 
 
 
 
 
 
 
 
275
  instructions: list[str] = []
276
  if inject_system_defaults:
277
  if tools:
@@ -290,7 +510,6 @@ def _prepare_messages_for_model(
290
  logger.debug("Injected default code block hint for Gemini conversation.")
291
 
292
  if not instructions:
293
- # Still need to ensure XML hint for the last user message if tools are present
294
  if tools and tool_choice != "none":
295
  _append_xml_hint_to_last_user_message(prepared)
296
  return prepared
@@ -323,7 +542,6 @@ def _response_items_to_messages(
323
  normalized_input: list[ResponseInputItem] = []
324
  for item in items:
325
  role = item.role
326
-
327
  content = item.content
328
  normalized_contents: list[ResponseInputContent] = []
329
  if isinstance(content, str):
@@ -394,7 +612,6 @@ def _instructions_to_messages(
394
  continue
395
 
396
  role = item.role
397
-
398
  content = item.content
399
  if isinstance(content, str):
400
  instruction_messages.append(Message(role=role, content=content))
@@ -432,10 +649,7 @@ def _instructions_to_messages(
432
 
433
 
434
  def _get_model_by_name(name: str) -> Model:
435
- """
436
- Retrieve a Model instance by name, considering custom models from config
437
- and the update strategy (append or overwrite).
438
- """
439
  strategy = g_config.gemini.model_strategy
440
  custom_models = {m.model_name: m for m in g_config.gemini.models if m.model_name}
441
 
@@ -449,9 +663,7 @@ def _get_model_by_name(name: str) -> Model:
449
 
450
 
451
  def _get_available_models() -> list[ModelData]:
452
- """
453
- Return a list of available models based on configuration strategy.
454
- """
455
  now = int(datetime.now(tz=timezone.utc).timestamp())
456
  strategy = g_config.gemini.model_strategy
457
  models_data = []
@@ -486,910 +698,934 @@ def _get_available_models() -> list[ModelData]:
486
  return models_data
487
 
488
 
489
- @router.get("/v1/models", response_model=ModelListResponse)
490
- async def list_models(api_key: str = Depends(verify_api_key)):
491
- models = _get_available_models()
492
- return ModelListResponse(data=models)
493
-
494
-
495
- @router.post("/v1/chat/completions")
496
- async def create_chat_completion(
497
- request: ChatCompletionRequest,
498
- api_key: str = Depends(verify_api_key),
499
- tmp_dir: Path = Depends(get_temp_dir),
500
- image_store: Path = Depends(get_image_store_dir),
501
- ):
502
- pool = GeminiClientPool()
503
- db = LMDBConversationStore()
504
-
505
- try:
506
- model = _get_model_by_name(request.model)
507
- except ValueError as exc:
508
- raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=str(exc)) from exc
509
-
510
- if len(request.messages) == 0:
511
- raise HTTPException(
512
- status_code=status.HTTP_400_BAD_REQUEST,
513
- detail="At least one message is required in the conversation.",
514
- )
515
 
516
- structured_requirement = _build_structured_requirement(request.response_format)
517
- if structured_requirement and request.stream:
518
- logger.debug(
519
- "Structured response requested with streaming enabled; will stream canonical JSON once ready."
520
- )
521
- if structured_requirement:
522
- logger.debug(
523
- f"Structured response requested for /v1/chat/completions (schema={structured_requirement.schema_name})."
524
- )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
525
 
526
- extra_instructions = [structured_requirement.instruction] if structured_requirement else None
 
527
 
528
- # Check if conversation is reusable
529
- session, client, remaining_messages = await _find_reusable_session(
530
- db, pool, model, request.messages
531
- )
532
 
533
- if session:
534
- # Optimization: When reusing a session, we don't need to resend the heavy tool definitions
535
- # or structured output instructions as they are already in the Gemini session history.
536
- messages_to_send = _prepare_messages_for_model(
537
- remaining_messages,
538
- request.tools,
539
- request.tool_choice,
540
- extra_instructions,
541
- inject_system_defaults=False,
542
- )
543
- if not messages_to_send:
544
- raise HTTPException(
545
- status_code=status.HTTP_400_BAD_REQUEST,
546
- detail="No new messages to send for the existing session.",
547
- )
548
- if len(messages_to_send) == 1:
549
- model_input, files = await GeminiClientWrapper.process_message(
550
- messages_to_send[0], tmp_dir, tagged=False
551
- )
552
- else:
553
- model_input, files = await GeminiClientWrapper.process_conversation(
554
- messages_to_send, tmp_dir
555
- )
556
- logger.debug(
557
- f"Reused session {session.metadata} - sending {len(messages_to_send)} prepared messages."
558
- )
559
- else:
560
- # Start a new session and concat messages into a single string
561
  try:
562
- client = await pool.acquire()
563
- session = client.start_chat(model=model)
564
- messages_to_send = _prepare_messages_for_model(
565
- request.messages, request.tools, request.tool_choice, extra_instructions
566
- )
567
- model_input, files = await GeminiClientWrapper.process_conversation(
568
- messages_to_send, tmp_dir
569
- )
570
- except ValueError as e:
571
- raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=str(e))
572
- except RuntimeError as e:
573
- raise HTTPException(status_code=status.HTTP_503_SERVICE_UNAVAILABLE, detail=str(e))
574
  except Exception as e:
575
- logger.exception(f"Error in preparing conversation: {e}")
576
  raise
577
- logger.debug("New session started.")
578
 
579
- # Generate response
 
 
 
 
580
  try:
581
- assert session and client, "Session and client not available"
582
- client_id = client.id
583
- logger.debug(
584
- f"Client ID: {client_id}, Input length: {len(model_input)}, files count: {len(files)}"
 
 
 
 
585
  )
586
- response = await _send_with_split(session, model_input, files=files)
587
- except APIError as exc:
588
- client_id = client.id if client else "unknown"
589
- logger.warning(f"Gemini API returned invalid response for client {client_id}: {exc}")
590
- raise HTTPException(
591
- status_code=status.HTTP_503_SERVICE_UNAVAILABLE,
592
- detail="Gemini temporarily returned an invalid response. Please retry.",
593
- ) from exc
594
- except HTTPException:
595
- raise
596
  except Exception as e:
597
- logger.exception(f"Unexpected error generating content from Gemini API: {e}")
598
- raise HTTPException(
599
- status_code=status.HTTP_502_BAD_GATEWAY,
600
- detail="Gemini returned an unexpected error.",
601
- ) from e
602
 
603
- # Format the response from API
604
- try:
605
- raw_output_with_think = GeminiClientWrapper.extract_output(response, include_thoughts=True)
606
- raw_output_clean = GeminiClientWrapper.extract_output(response, include_thoughts=False)
607
- except IndexError as exc:
608
- logger.exception("Gemini output parsing failed (IndexError).")
609
- raise HTTPException(
610
- status_code=status.HTTP_502_BAD_GATEWAY,
611
- detail="Gemini returned malformed response content.",
612
- ) from exc
613
- except Exception as exc:
614
- logger.exception("Gemini output parsing failed unexpectedly.")
615
- raise HTTPException(
616
- status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
617
- detail="Gemini output parsing failed unexpectedly.",
618
- ) from exc
619
 
620
- visible_output, tool_calls = extract_tool_calls(raw_output_with_think)
621
- storage_output = remove_tool_call_blocks(raw_output_clean).strip()
622
- tool_calls_payload = [call.model_dump(mode="json") for call in tool_calls]
 
 
 
 
 
 
623
 
624
- if structured_requirement:
625
- cleaned_visible = strip_code_fence(visible_output or "")
626
- if not cleaned_visible:
627
- raise HTTPException(
628
- status_code=status.HTTP_502_BAD_GATEWAY,
629
- detail="LLM returned an empty response while JSON schema output was requested.",
630
- )
631
- try:
632
- structured_payload = orjson.loads(cleaned_visible)
633
- except orjson.JSONDecodeError as exc:
634
- logger.warning(
635
- f"Failed to decode JSON for structured response (schema={structured_requirement.schema_name}): "
636
- f"{cleaned_visible}"
637
- )
638
- raise HTTPException(
639
- status_code=status.HTTP_502_BAD_GATEWAY,
640
- detail="LLM returned invalid JSON for the requested response_format.",
641
- ) from exc
642
 
643
- canonical_output = orjson.dumps(structured_payload).decode("utf-8")
644
- visible_output = canonical_output
645
- storage_output = canonical_output
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
646
 
647
- if tool_calls_payload:
648
- logger.debug(f"Detected tool calls: {tool_calls_payload}")
 
 
 
649
 
650
- # After formatting, persist the conversation to LMDB
651
- try:
652
- current_assistant_message = Message(
653
- role="assistant",
654
- content=storage_output or None,
655
- tool_calls=tool_calls or None,
656
- )
657
- # Sanitize the entire history including the new message to ensure consistency
658
- full_history = [*request.messages, current_assistant_message]
659
- cleaned_history = db.sanitize_assistant_messages(full_history)
660
 
661
- conv = ConversationInStore(
662
- model=model.model_name,
663
- client_id=client.id,
664
- metadata=session.metadata,
665
- messages=cleaned_history,
666
- )
667
- key = db.store(conv)
668
- logger.debug(f"Conversation saved to LMDB with key: {key}")
669
- except Exception as e:
670
- # We can still return the response even if saving fails
671
- logger.warning(f"Failed to save conversation to LMDB: {e}")
672
 
673
- # Return with streaming or standard response
674
- completion_id = f"chatcmpl-{uuid.uuid4()}"
675
- timestamp = int(datetime.now(tz=timezone.utc).timestamp())
676
- if request.stream:
677
- return _create_streaming_response(
678
- visible_output,
679
- tool_calls_payload,
680
- completion_id,
681
- timestamp,
682
- request.model,
683
- request.messages,
684
- )
685
- else:
686
- return _create_standard_response(
687
- visible_output,
688
- tool_calls_payload,
689
- completion_id,
690
- timestamp,
691
- request.model,
692
- request.messages,
693
- )
694
-
695
-
696
- @router.post("/v1/responses")
697
- async def create_response(
698
- request_data: ResponseCreateRequest,
699
- request: Request,
700
- api_key: str = Depends(verify_api_key),
701
- tmp_dir: Path = Depends(get_temp_dir),
702
- image_store: Path = Depends(get_image_store_dir),
703
- ):
704
- base_messages, normalized_input = _response_items_to_messages(request_data.input)
705
- structured_requirement = _build_structured_requirement(request_data.response_format)
706
- if structured_requirement and request_data.stream:
707
- logger.debug(
708
- "Structured response requested with streaming enabled; streaming not supported for Responses."
709
- )
710
-
711
- extra_instructions: list[str] = []
712
- if structured_requirement:
713
- extra_instructions.append(structured_requirement.instruction)
714
- logger.debug(
715
- f"Structured response requested for /v1/responses (schema={structured_requirement.schema_name})."
716
- )
717
-
718
- # Separate standard tools from image generation tools
719
- standard_tools: list[Tool] = []
720
- image_tools: list[ResponseImageTool] = []
721
-
722
- if request_data.tools:
723
- for t in request_data.tools:
724
- if isinstance(t, Tool):
725
- standard_tools.append(t)
726
- elif isinstance(t, ResponseImageTool):
727
- image_tools.append(t)
728
- # Handle dicts if Pydantic didn't convert them fully (fallback)
729
- elif isinstance(t, dict):
730
- t_type = t.get("type")
731
- if t_type == "function":
732
- standard_tools.append(Tool.model_validate(t))
733
- elif t_type == "image_generation":
734
- image_tools.append(ResponseImageTool.model_validate(t))
735
-
736
- image_instruction = _build_image_generation_instruction(
737
- image_tools,
738
- request_data.tool_choice
739
- if isinstance(request_data.tool_choice, ResponseToolChoice)
740
- else None,
741
- )
742
- if image_instruction:
743
- extra_instructions.append(image_instruction)
744
- logger.debug("Image generation support enabled for /v1/responses request.")
745
-
746
- preface_messages = _instructions_to_messages(request_data.instructions)
747
- conversation_messages = base_messages
748
- if preface_messages:
749
- conversation_messages = [*preface_messages, *base_messages]
750
- logger.debug(
751
- f"Injected {len(preface_messages)} instruction messages before sending to Gemini."
752
- )
753
-
754
- # Pass standard tools to the prompt builder
755
- # Determine tool_choice for standard tools (ignore image_generation choice here as it is handled via instruction)
756
- model_tool_choice = None
757
- if isinstance(request_data.tool_choice, str):
758
- model_tool_choice = request_data.tool_choice
759
- elif isinstance(request_data.tool_choice, ToolChoiceFunction):
760
- model_tool_choice = request_data.tool_choice
761
- # If tool_choice is ResponseToolChoice (image_generation), we don't pass it as a function tool choice.
762
-
763
- messages = _prepare_messages_for_model(
764
- conversation_messages,
765
- tools=standard_tools or None,
766
- tool_choice=model_tool_choice,
767
- extra_instructions=extra_instructions or None,
768
- )
769
-
770
- pool = GeminiClientPool()
771
- db = LMDBConversationStore()
772
-
773
- try:
774
- model = _get_model_by_name(request_data.model)
775
- except ValueError as exc:
776
- raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=str(exc)) from exc
777
-
778
- session, client, remaining_messages = await _find_reusable_session(db, pool, model, messages)
779
-
780
- async def _build_payload(
781
- _payload_messages: list[Message], _reuse_session: bool
782
- ) -> tuple[str, list[Path | str]]:
783
- if _reuse_session and len(_payload_messages) == 1:
784
- return await GeminiClientWrapper.process_message(
785
- _payload_messages[0], tmp_dir, tagged=False
786
- )
787
- return await GeminiClientWrapper.process_conversation(_payload_messages, tmp_dir)
788
-
789
- reuse_session = session is not None
790
- if reuse_session:
791
- messages_to_send = _prepare_messages_for_model(
792
- remaining_messages,
793
- tools=request_data.tools, # Keep for XML hint logic
794
- tool_choice=request_data.tool_choice,
795
- extra_instructions=None, # Already in session history
796
- inject_system_defaults=False,
797
- )
798
- if not messages_to_send:
799
- raise HTTPException(
800
- status_code=status.HTTP_400_BAD_REQUEST,
801
- detail="No new messages to send for the existing session.",
802
- )
803
- payload_messages = messages_to_send
804
- model_input, files = await _build_payload(payload_messages, _reuse_session=True)
805
- logger.debug(
806
- f"Reused session {session.metadata} - sending {len(payload_messages)} prepared messages."
807
- )
808
- else:
809
- try:
810
- client = await pool.acquire()
811
- session = client.start_chat(model=model)
812
- payload_messages = messages
813
- model_input, files = await _build_payload(payload_messages, _reuse_session=False)
814
- except ValueError as e:
815
- raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=str(e))
816
- except RuntimeError as e:
817
- raise HTTPException(status_code=status.HTTP_503_SERVICE_UNAVAILABLE, detail=str(e))
818
- except Exception as e:
819
- logger.exception(f"Error in preparing conversation for responses API: {e}")
820
- raise
821
- logger.debug("New session started for /v1/responses request.")
822
-
823
- try:
824
- assert session and client, "Session and client not available"
825
- client_id = client.id
826
- logger.debug(
827
- f"Client ID: {client_id}, Input length: {len(model_input)}, files count: {len(files)}"
828
- )
829
- model_output = await _send_with_split(session, model_input, files=files)
830
- except APIError as exc:
831
- client_id = client.id if client else "unknown"
832
- logger.warning(f"Gemini API returned invalid response for client {client_id}: {exc}")
833
- raise HTTPException(
834
- status_code=status.HTTP_503_SERVICE_UNAVAILABLE,
835
- detail="Gemini temporarily returned an invalid response. Please retry.",
836
- ) from exc
837
- except HTTPException:
838
- raise
839
- except Exception as e:
840
- logger.exception(f"Unexpected error generating content from Gemini API for responses: {e}")
841
- raise HTTPException(
842
- status_code=status.HTTP_502_BAD_GATEWAY,
843
- detail="Gemini returned an unexpected error.",
844
- ) from e
845
-
846
- try:
847
- text_with_think = GeminiClientWrapper.extract_output(model_output, include_thoughts=True)
848
- text_without_think = GeminiClientWrapper.extract_output(
849
- model_output, include_thoughts=False
850
- )
851
- except IndexError as exc:
852
- logger.exception("Gemini output parsing failed (IndexError).")
853
- raise HTTPException(
854
- status_code=status.HTTP_502_BAD_GATEWAY,
855
- detail="Gemini returned malformed response content.",
856
- ) from exc
857
- except Exception as exc:
858
- logger.exception("Gemini output parsing failed unexpectedly.")
859
- raise HTTPException(
860
- status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
861
- detail="Gemini output parsing failed unexpectedly.",
862
- ) from exc
863
-
864
- visible_text, detected_tool_calls = extract_tool_calls(text_with_think)
865
- storage_output = remove_tool_call_blocks(text_without_think).strip()
866
- assistant_text = LMDBConversationStore.remove_think_tags(visible_text.strip())
867
-
868
- if structured_requirement:
869
- cleaned_visible = strip_code_fence(assistant_text or "")
870
- if not cleaned_visible:
871
- raise HTTPException(
872
- status_code=status.HTTP_502_BAD_GATEWAY,
873
- detail="LLM returned an empty response while JSON schema output was requested.",
874
- )
875
- try:
876
- structured_payload = orjson.loads(cleaned_visible)
877
- except orjson.JSONDecodeError as exc:
878
- logger.warning(
879
- f"Failed to decode JSON for structured response (schema={structured_requirement.schema_name}): "
880
- f"{cleaned_visible}"
881
- )
882
- raise HTTPException(
883
- status_code=status.HTTP_502_BAD_GATEWAY,
884
- detail="LLM returned invalid JSON for the requested response_format.",
885
- ) from exc
886
-
887
- canonical_output = orjson.dumps(structured_payload).decode("utf-8")
888
- assistant_text = canonical_output
889
- storage_output = canonical_output
890
- logger.debug(
891
- f"Structured response fulfilled for /v1/responses (schema={structured_requirement.schema_name})."
892
- )
893
-
894
- expects_image = (
895
- request_data.tool_choice is not None and request_data.tool_choice.type == "image_generation"
896
- )
897
- images = model_output.images or []
898
- logger.debug(
899
- f"Gemini returned {len(images)} image(s) for /v1/responses "
900
- f"(expects_image={expects_image}, instruction_applied={bool(image_instruction)})."
901
- )
902
- if expects_image and not images:
903
- summary = assistant_text.strip() if assistant_text else ""
904
- if summary:
905
- summary = re.sub(r"\s+", " ", summary)
906
- if len(summary) > 200:
907
- summary = f"{summary[:197]}..."
908
- logger.warning(
909
- "Image generation requested but Gemini produced no images. "
910
- f"client_id={client_id}, forced_tool_choice={request_data.tool_choice is not None}, "
911
- f"instruction_applied={bool(image_instruction)}, assistant_preview='{summary}'"
912
- )
913
- detail = "LLM returned no images for the requested image_generation tool."
914
- if summary:
915
- detail = f"{detail} Assistant response: {summary}"
916
- raise HTTPException(status_code=status.HTTP_502_BAD_GATEWAY, detail=detail)
917
-
918
- response_contents: list[ResponseOutputContent] = []
919
- image_call_items: list[ResponseImageGenerationCall] = []
920
- for image in images:
921
- try:
922
- image_base64, width, height, filename = await _image_to_base64(image, image_store)
923
- except Exception as exc:
924
- logger.warning(f"Failed to download generated image: {exc}")
925
- continue
926
-
927
- img_format = "png" if isinstance(image, GeneratedImage) else "jpeg"
928
-
929
- # Use static URL for compatibility
930
- image_url = (
931
- f"![{filename}]({request.base_url}images/{filename}?token={get_image_token(filename)})"
932
- )
933
-
934
- image_call_items.append(
935
- ResponseImageGenerationCall(
936
- id=filename.rsplit(".", 1)[0],
937
- status="completed",
938
- result=image_base64,
939
- output_format=img_format,
940
- size=f"{width}x{height}" if width and height else None,
941
- )
942
- )
943
- # Add as output_text content for compatibility
944
- response_contents.append(
945
- ResponseOutputContent(type="output_text", text=image_url, annotations=[])
946
- )
947
-
948
- tool_call_items: list[ResponseToolCall] = []
949
- if detected_tool_calls:
950
- tool_call_items = [
951
- ResponseToolCall(
952
- id=call.id,
953
- status="completed",
954
- function=call.function,
955
- )
956
- for call in detected_tool_calls
957
- ]
958
-
959
- if assistant_text:
960
- response_contents.append(
961
- ResponseOutputContent(type="output_text", text=assistant_text, annotations=[])
962
- )
963
-
964
- if not response_contents:
965
- response_contents.append(ResponseOutputContent(type="output_text", text="", annotations=[]))
966
-
967
- created_time = int(datetime.now(tz=timezone.utc).timestamp())
968
- response_id = f"resp_{uuid.uuid4().hex}"
969
- message_id = f"msg_{uuid.uuid4().hex}"
970
-
971
- input_tokens = sum(estimate_tokens(text_from_message(msg)) for msg in messages)
972
- tool_arg_text = "".join(call.function.arguments or "" for call in detected_tool_calls)
973
- completion_basis = assistant_text or ""
974
- if tool_arg_text:
975
- completion_basis = (
976
- f"{completion_basis}\n{tool_arg_text}" if completion_basis else tool_arg_text
977
- )
978
- output_tokens = estimate_tokens(completion_basis)
979
- usage = ResponseUsage(
980
- input_tokens=input_tokens,
981
- output_tokens=output_tokens,
982
- total_tokens=input_tokens + output_tokens,
983
- )
984
 
985
- response_payload = ResponseCreateResponse(
986
- id=response_id,
987
- created_at=created_time,
988
- model=request_data.model,
989
- output=[
990
- ResponseOutputMessage(
991
- id=message_id,
992
- type="message",
993
- role="assistant",
994
- content=response_contents,
995
- ),
996
- *tool_call_items,
997
- *image_call_items,
998
- ],
999
- status="completed",
1000
- usage=usage,
1001
- input=normalized_input or None,
1002
- metadata=request_data.metadata or None,
1003
- tools=request_data.tools,
1004
- tool_choice=request_data.tool_choice,
1005
- )
1006
 
1007
- try:
1008
- current_assistant_message = Message(
1009
- role="assistant",
1010
- content=storage_output or None,
1011
- tool_calls=detected_tool_calls or None,
1012
- )
1013
- full_history = [*messages, current_assistant_message]
1014
- cleaned_history = db.sanitize_assistant_messages(full_history)
1015
 
1016
- conv = ConversationInStore(
1017
- model=model.model_name,
1018
- client_id=client.id,
1019
- metadata=session.metadata,
1020
- messages=cleaned_history,
1021
- )
1022
- key = db.store(conv)
1023
- logger.debug(f"Conversation saved to LMDB with key: {key}")
1024
- except Exception as exc:
1025
- logger.warning(f"Failed to save Responses conversation to LMDB: {exc}")
1026
 
1027
- if request_data.stream:
1028
- logger.debug(
1029
- f"Streaming Responses API payload (response_id={response_payload.id}, text_chunks={bool(assistant_text)})."
1030
- )
1031
- return _create_responses_streaming_response(response_payload, assistant_text or "")
1032
 
1033
- return response_payload
1034
 
1035
 
1036
- async def _find_reusable_session(
 
 
 
 
 
1037
  db: LMDBConversationStore,
1038
- pool: GeminiClientPool,
1039
  model: Model,
1040
- messages: list[Message],
1041
- ) -> tuple[ChatSession | None, GeminiClientWrapper | None, list[Message]]:
1042
- """Find an existing chat session that matches the *longest* prefix of
1043
- ``messages`` **whose last element is an assistant/system reply**.
1044
-
1045
- Rationale
1046
- ---------
1047
- When a reply was generated by *another* server instance, the local LMDB may
1048
- only contain an older part of the conversation. However, as long as we can
1049
- line up **any** earlier assistant/system response, we can restore the
1050
- corresponding Gemini session and replay the *remaining* turns locally
1051
- (including that missing assistant reply and the subsequent user prompts).
1052
-
1053
- The algorithm therefore walks backwards through the history **one message at
1054
- a time**, each time requiring the current tail to be assistant/system before
1055
- querying LMDB. As soon as a match is found we recreate the session and
1056
- return the untouched suffix as ``remaining_messages``.
1057
- """
1058
-
1059
- if len(messages) < 2:
1060
- return None, None, messages
1061
-
1062
- # Start with the full history and iteratively trim from the end.
1063
- search_end = len(messages)
1064
-
1065
- while search_end >= 2:
1066
- search_history = messages[:search_end]
1067
-
1068
- # Only try to match if the last stored message would be assistant/system/tool before querying LMDB.
1069
- if search_history[-1].role in {"assistant", "system", "tool"}:
1070
- try:
1071
- if conv := db.find(model.model_name, search_history):
1072
- # Check if metadata is too old
1073
- now = datetime.now()
1074
- updated_at = conv.updated_at or conv.created_at or now
1075
- age_minutes = (now - updated_at).total_seconds() / 60
1076
-
1077
- if age_minutes <= METADATA_TTL_MINUTES:
1078
- client = await pool.acquire(conv.client_id)
1079
- session = client.start_chat(metadata=conv.metadata, model=model)
1080
- remain = messages[search_end:]
1081
- logger.debug(
1082
- f"Match found at prefix length {search_end}. Client: {conv.client_id}"
1083
- )
1084
- return session, client, remain
1085
- else:
1086
- logger.debug(
1087
- f"Matched conversation is too old ({age_minutes:.1f}m), skipping reuse."
1088
- )
1089
- except Exception as e:
1090
- logger.warning(
1091
- f"Error checking LMDB for reusable session at length {search_end}: {e}"
1092
- )
1093
- break
1094
-
1095
- # Trim one message and try again.
1096
- search_end -= 1
1097
-
1098
- return None, None, messages
1099
-
1100
-
1101
- async def _send_with_split(session: ChatSession, text: str, files: list[Path | str] | None = None):
1102
  """
1103
- Send text to Gemini. If text is longer than ``MAX_CHARS_PER_REQUEST``,
1104
- it is converted into a temporary text file attachment to avoid splitting issues.
1105
  """
1106
- if len(text) <= MAX_CHARS_PER_REQUEST:
1107
- try:
1108
- return await session.send_message(text, files=files)
1109
- except Exception as e:
1110
- logger.exception(f"Error sending message to Gemini: {e}")
1111
- raise
1112
-
1113
- logger.info(
1114
- f"Message length ({len(text)}) exceeds limit ({MAX_CHARS_PER_REQUEST}). Converting text to file attachment."
1115
- )
1116
-
1117
- # Create a temporary directory to hold the message.txt file
1118
- # This ensures the filename is exactly 'message.txt' as expected by the instruction.
1119
- with tempfile.TemporaryDirectory() as tmpdirname:
1120
- temp_file_path = Path(tmpdirname) / "message.txt"
1121
- temp_file_path.write_text(text, encoding="utf-8")
1122
 
 
 
 
 
 
 
1123
  try:
1124
- # Prepare the files list
1125
- final_files = list(files) if files else []
1126
- final_files.append(temp_file_path)
1127
-
1128
- instruction = (
1129
- "The user's input exceeds the character limit and is provided in the attached file `message.txt`.\n\n"
1130
- "**System Instruction:**\n"
1131
- "1. Read the content of `message.txt`.\n"
1132
- "2. Treat that content as the **primary** user prompt for this turn.\n"
1133
- "3. Execute the instructions or answer the questions found *inside* that file immediately.\n"
1134
- )
1135
-
1136
- logger.debug(f"Sending prompt as temporary file: {temp_file_path}")
1137
-
1138
- return await session.send_message(instruction, files=final_files)
1139
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1140
  except Exception as e:
1141
- logger.exception(f"Error sending large text as file to Gemini: {e}")
1142
- raise
 
1143
 
 
 
 
 
 
 
1144
 
1145
- def _create_streaming_response(
1146
- model_output: str,
1147
- tool_calls: list[dict],
1148
- completion_id: str,
1149
- created_time: int,
1150
- model: str,
1151
- messages: list[Message],
1152
- ) -> StreamingResponse:
1153
- """Create streaming response with `usage` calculation included in the final chunk."""
1154
 
1155
- # Calculate token usage
1156
- prompt_tokens = sum(estimate_tokens(text_from_message(msg)) for msg in messages)
1157
- tool_args = "".join(call.get("function", {}).get("arguments", "") for call in tool_calls or [])
1158
- completion_tokens = estimate_tokens(model_output + tool_args)
1159
- total_tokens = prompt_tokens + completion_tokens
1160
- finish_reason = "tool_calls" if tool_calls else "stop"
 
 
 
 
 
1161
 
1162
- async def generate_stream():
1163
- # Send start event
1164
- data = {
1165
- "id": completion_id,
1166
- "object": "chat.completion.chunk",
1167
- "created": created_time,
1168
- "model": model,
1169
- "choices": [{"index": 0, "delta": {"role": "assistant"}, "finish_reason": None}],
1170
- }
1171
- yield f"data: {orjson.dumps(data).decode('utf-8')}\n\n"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1172
 
1173
- # Stream output text in chunks for efficiency
1174
- for chunk in iter_stream_segments(model_output):
 
 
 
 
 
 
 
 
 
1175
  data = {
1176
  "id": completion_id,
1177
  "object": "chat.completion.chunk",
1178
  "created": created_time,
1179
- "model": model,
1180
- "choices": [{"index": 0, "delta": {"content": chunk}, "finish_reason": None}],
 
 
1181
  }
1182
  yield f"data: {orjson.dumps(data).decode('utf-8')}\n\n"
1183
 
1184
- if tool_calls:
1185
- tool_calls_delta = [{**call, "index": idx} for idx, call in enumerate(tool_calls)]
 
 
 
1186
  data = {
1187
  "id": completion_id,
1188
  "object": "chat.completion.chunk",
1189
  "created": created_time,
1190
- "model": model,
1191
  "choices": [
1192
- {
1193
- "index": 0,
1194
- "delta": {"tool_calls": tool_calls_delta},
1195
- "finish_reason": None,
1196
- }
1197
  ],
1198
  }
1199
  yield f"data: {orjson.dumps(data).decode('utf-8')}\n\n"
1200
 
1201
- # Send end event
 
1202
  data = {
1203
  "id": completion_id,
1204
  "object": "chat.completion.chunk",
1205
  "created": created_time,
1206
- "model": model,
1207
- "choices": [{"index": 0, "delta": {}, "finish_reason": finish_reason}],
1208
- "usage": {
1209
- "prompt_tokens": prompt_tokens,
1210
- "completion_tokens": completion_tokens,
1211
- "total_tokens": total_tokens,
1212
- },
1213
  }
 
 
 
 
 
 
 
 
 
1214
  yield f"data: {orjson.dumps(data).decode('utf-8')}\n\n"
1215
  yield "data: [DONE]\n\n"
1216
 
1217
  return StreamingResponse(generate_stream(), media_type="text/event-stream")
1218
 
1219
 
1220
- def _create_responses_streaming_response(
1221
- response_payload: ResponseCreateResponse,
1222
- assistant_text: str | None,
 
 
 
 
 
 
 
 
 
 
 
1223
  ) -> StreamingResponse:
1224
- """Create streaming response for Responses API using event types defined by OpenAI."""
1225
-
1226
- response_dict = response_payload.model_dump(mode="json")
1227
- response_id = response_payload.id
1228
- created_time = response_payload.created_at
1229
- model = response_payload.model
1230
-
1231
- logger.debug(
1232
- f"Preparing streaming envelope for /v1/responses (response_id={response_id}, model={model})."
1233
- )
1234
-
1235
  base_event = {
1236
  "id": response_id,
1237
  "object": "response",
1238
  "created_at": created_time,
1239
- "model": model,
1240
- }
1241
-
1242
- created_snapshot: dict[str, Any] = {
1243
- "id": response_id,
1244
- "object": "response",
1245
- "created_at": created_time,
1246
- "model": model,
1247
- "status": "in_progress",
1248
  }
1249
- if response_dict.get("metadata") is not None:
1250
- created_snapshot["metadata"] = response_dict["metadata"]
1251
- if response_dict.get("input") is not None:
1252
- created_snapshot["input"] = response_dict["input"]
1253
- if response_dict.get("tools") is not None:
1254
- created_snapshot["tools"] = response_dict["tools"]
1255
- if response_dict.get("tool_choice") is not None:
1256
- created_snapshot["tool_choice"] = response_dict["tool_choice"]
1257
 
1258
  async def generate_stream():
1259
- # Emit creation event
1260
- data = {
1261
- **base_event,
1262
- "type": "response.created",
1263
- "response": created_snapshot,
1264
- }
1265
- yield f"data: {orjson.dumps(data).decode('utf-8')}\n\n"
1266
 
1267
- # Stream output items (Message/Text, Tool Calls, Images)
1268
- for i, item in enumerate(response_payload.output):
1269
- item_json = item.model_dump(mode="json", exclude_none=True)
 
1270
 
1271
- added_event = {
1272
- **base_event,
1273
- "type": "response.output_item.added",
1274
- "output_index": i,
1275
- "item": item_json,
1276
- }
1277
- yield f"data: {orjson.dumps(added_event).decode('utf-8')}\n\n"
1278
-
1279
- # 2. Stream content if it's a message (text)
1280
- if item.type == "message":
1281
- content_text = ""
1282
- # Aggregate text content to stream
1283
- for c in item.content:
1284
- if c.type == "output_text" and c.text:
1285
- content_text += c.text
1286
-
1287
- if content_text:
1288
- for chunk in iter_stream_segments(content_text):
1289
- delta_event = {
1290
- **base_event,
1291
- "type": "response.output_text.delta",
1292
- "output_index": i,
1293
- "delta": chunk,
1294
- }
1295
- yield f"data: {orjson.dumps(delta_event).decode('utf-8')}\n\n"
1296
 
1297
- # Text done
1298
- done_event = {
1299
- **base_event,
1300
- "type": "response.output_text.done",
1301
- "output_index": i,
1302
- }
1303
- yield f"data: {orjson.dumps(done_event).decode('utf-8')}\n\n"
1304
-
1305
- # 3. Emit output_item.done for all types
1306
- # This confirms the item is fully transferred.
1307
- item_done_event = {
1308
- **base_event,
1309
- "type": "response.output_item.done",
1310
- "output_index": i,
1311
- "item": item_json,
1312
- }
1313
- yield f"data: {orjson.dumps(item_done_event).decode('utf-8')}\n\n"
 
1314
 
1315
- # Emit completed event with full payload
1316
- completed_event = {
1317
- **base_event,
1318
- "type": "response.completed",
1319
- "response": response_dict,
1320
- }
1321
- yield f"data: {orjson.dumps(completed_event).decode('utf-8')}\n\n"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1322
  yield "data: [DONE]\n\n"
1323
 
1324
  return StreamingResponse(generate_stream(), media_type="text/event-stream")
1325
 
1326
 
1327
- def _create_standard_response(
1328
- model_output: str,
1329
- tool_calls: list[dict],
1330
- completion_id: str,
1331
- created_time: int,
1332
- model: str,
1333
- messages: list[Message],
1334
- ) -> dict:
1335
- """Create standard response"""
1336
- # Calculate token usage
1337
- prompt_tokens = sum(estimate_tokens(text_from_message(msg)) for msg in messages)
1338
- tool_args = "".join(call.get("function", {}).get("arguments", "") for call in tool_calls or [])
1339
- completion_tokens = estimate_tokens(model_output + tool_args)
1340
- total_tokens = prompt_tokens + completion_tokens
1341
- finish_reason = "tool_calls" if tool_calls else "stop"
1342
 
1343
- message_payload: dict = {"role": "assistant", "content": model_output or None}
1344
- if tool_calls:
1345
- message_payload["tool_calls"] = tool_calls
1346
 
1347
- result = {
1348
- "id": completion_id,
1349
- "object": "chat.completion",
1350
- "created": created_time,
1351
- "model": model,
1352
- "choices": [
1353
- {
1354
- "index": 0,
1355
- "message": message_payload,
1356
- "finish_reason": finish_reason,
1357
- }
1358
- ],
1359
- "usage": {
1360
- "prompt_tokens": prompt_tokens,
1361
- "completion_tokens": completion_tokens,
1362
- "total_tokens": total_tokens,
1363
- },
1364
- }
1365
 
1366
- logger.debug(f"Response created with {total_tokens} total tokens")
1367
- return result
1368
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1369
 
1370
- async def _image_to_base64(image: Image, temp_dir: Path) -> tuple[str, int | None, int | None, str]:
1371
- """Persist an image provided by gemini_webapi and return base64 plus dimensions and filename."""
1372
- if isinstance(image, GeneratedImage):
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1373
  try:
1374
- saved_path = await image.save(path=str(temp_dir), full_size=True)
 
 
 
1375
  except Exception as e:
1376
- logger.warning(
1377
- f"Failed to download full-size GeneratedImage, retrying with default size: {e}"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1378
  )
1379
- saved_path = await image.save(path=str(temp_dir), full_size=False)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1380
  else:
1381
- saved_path = await image.save(path=str(temp_dir))
 
 
 
 
 
 
1382
 
1383
- if not saved_path:
1384
- raise ValueError("Failed to save generated image")
1385
 
1386
- # Rename file to a random UUID to ensure uniqueness and unpredictability
1387
- original_path = Path(saved_path)
1388
- random_name = f"img_{uuid.uuid4().hex}{original_path.suffix}"
1389
- new_path = temp_dir / random_name
1390
- original_path.rename(new_path)
 
 
 
 
 
 
1391
 
1392
- data = new_path.read_bytes()
1393
- width, height = extract_image_dimensions(data)
1394
- filename = random_name
1395
- return base64.b64encode(data).decode("ascii"), width, height, filename
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  import base64
2
+ import hashlib
3
+ import io
4
+ import reprlib
5
  import uuid
6
  from dataclasses import dataclass
7
  from datetime import datetime, timezone
8
  from pathlib import Path
9
+ from typing import Any, AsyncGenerator
10
 
11
  import orjson
12
  from fastapi import APIRouter, Depends, HTTPException, Request, status
13
  from fastapi.responses import StreamingResponse
14
+ from gemini_webapi import ModelOutput
15
  from gemini_webapi.client import ChatSession
16
  from gemini_webapi.constants import Model
 
17
  from gemini_webapi.types.image import GeneratedImage, Image
18
  from loguru import logger
19
 
 
43
  from ..utils.helper import (
44
  CODE_BLOCK_HINT,
45
  CODE_HINT_STRIPPED,
46
+ CONTROL_TOKEN_RE,
47
  XML_HINT_STRIPPED,
48
  XML_WRAP_HINT,
49
  estimate_tokens,
50
  extract_image_dimensions,
51
  extract_tool_calls,
 
 
52
  strip_code_fence,
53
  text_from_message,
54
  )
55
  from .middleware import get_image_store_dir, get_image_token, get_temp_dir, verify_api_key
56
 
 
57
  MAX_CHARS_PER_REQUEST = int(g_config.gemini.max_chars_per_request * 0.9)
 
58
  METADATA_TTL_MINUTES = 15
59
 
60
  router = APIRouter()
 
70
  raw_format: dict[str, Any]
71
 
72
 
73
+ # --- Helper Functions ---
74
+
75
+
76
+ async def _image_to_base64(
77
+ image: Image, temp_dir: Path
78
+ ) -> tuple[str, int | None, int | None, str, str]:
79
+ """Persist an image provided by gemini_webapi and return base64 plus dimensions, filename, and hash."""
80
+ if isinstance(image, GeneratedImage):
81
+ try:
82
+ saved_path = await image.save(path=str(temp_dir), full_size=True)
83
+ except Exception as e:
84
+ logger.warning(
85
+ f"Failed to download full-size GeneratedImage, retrying with default size: {e}"
86
+ )
87
+ saved_path = await image.save(path=str(temp_dir), full_size=False)
88
+ else:
89
+ saved_path = await image.save(path=str(temp_dir))
90
+
91
+ if not saved_path:
92
+ raise ValueError("Failed to save generated image")
93
+
94
+ original_path = Path(saved_path)
95
+ random_name = f"img_{uuid.uuid4().hex}{original_path.suffix}"
96
+ new_path = temp_dir / random_name
97
+ original_path.rename(new_path)
98
+
99
+ data = new_path.read_bytes()
100
+ width, height = extract_image_dimensions(data)
101
+ filename = random_name
102
+ file_hash = hashlib.sha256(data).hexdigest()
103
+ return base64.b64encode(data).decode("ascii"), width, height, filename, file_hash
104
+
105
+
106
+ def _calculate_usage(
107
+ messages: list[Message],
108
+ assistant_text: str | None,
109
+ tool_calls: list[Any] | None,
110
+ ) -> tuple[int, int, int]:
111
+ """Calculate prompt, completion and total tokens consistently."""
112
+ prompt_tokens = sum(estimate_tokens(text_from_message(msg)) for msg in messages)
113
+ tool_args_text = ""
114
+ if tool_calls:
115
+ for call in tool_calls:
116
+ if hasattr(call, "function"):
117
+ tool_args_text += call.function.arguments or ""
118
+ elif isinstance(call, dict):
119
+ tool_args_text += call.get("function", {}).get("arguments", "")
120
+
121
+ completion_basis = assistant_text or ""
122
+ if tool_args_text:
123
+ completion_basis = (
124
+ f"{completion_basis}\n{tool_args_text}" if completion_basis else tool_args_text
125
+ )
126
+
127
+ completion_tokens = estimate_tokens(completion_basis)
128
+ return prompt_tokens, completion_tokens, prompt_tokens + completion_tokens
129
+
130
+
131
+ def _create_responses_standard_payload(
132
+ response_id: str,
133
+ created_time: int,
134
+ model_name: str,
135
+ detected_tool_calls: list[Any] | None,
136
+ image_call_items: list[ResponseImageGenerationCall],
137
+ response_contents: list[ResponseOutputContent],
138
+ usage: ResponseUsage,
139
+ request: ResponseCreateRequest,
140
+ normalized_input: Any,
141
+ ) -> ResponseCreateResponse:
142
+ """Unified factory for building ResponseCreateResponse objects."""
143
+ message_id = f"msg_{uuid.uuid4().hex}"
144
+ tool_call_items: list[ResponseToolCall] = []
145
+ if detected_tool_calls:
146
+ tool_call_items = [
147
+ ResponseToolCall(
148
+ id=call.id if hasattr(call, "id") else call["id"],
149
+ status="completed",
150
+ function=call.function if hasattr(call, "function") else call["function"],
151
+ )
152
+ for call in detected_tool_calls
153
+ ]
154
+
155
+ return ResponseCreateResponse(
156
+ id=response_id,
157
+ created_at=created_time,
158
+ model=model_name,
159
+ output=[
160
+ ResponseOutputMessage(
161
+ id=message_id,
162
+ type="message",
163
+ role="assistant",
164
+ content=response_contents,
165
+ ),
166
+ *tool_call_items,
167
+ *image_call_items,
168
+ ],
169
+ status="completed",
170
+ usage=usage,
171
+ input=normalized_input or None,
172
+ metadata=request.metadata or None,
173
+ tools=request.tools,
174
+ tool_choice=request.tool_choice,
175
+ )
176
+
177
+
178
+ def _create_chat_completion_standard_payload(
179
+ completion_id: str,
180
+ created_time: int,
181
+ model_name: str,
182
+ visible_output: str | None,
183
+ tool_calls_payload: list[dict] | None,
184
+ finish_reason: str,
185
+ usage: dict,
186
+ ) -> dict:
187
+ """Unified factory for building Chat Completion response dictionaries."""
188
+ return {
189
+ "id": completion_id,
190
+ "object": "chat.completion",
191
+ "created": created_time,
192
+ "model": model_name,
193
+ "choices": [
194
+ {
195
+ "index": 0,
196
+ "message": {
197
+ "role": "assistant",
198
+ "content": visible_output or None,
199
+ "tool_calls": tool_calls_payload or None,
200
+ },
201
+ "finish_reason": finish_reason,
202
+ }
203
+ ],
204
+ "usage": usage,
205
+ }
206
+
207
+
208
+ def _process_llm_output(
209
+ raw_output_with_think: str,
210
+ raw_output_clean: str,
211
+ structured_requirement: StructuredOutputRequirement | None,
212
+ ) -> tuple[str, str, list[Any]]:
213
+ """
214
+ Common post-processing logic for Gemini output.
215
+ Returns: (visible_text, storage_output, tool_calls)
216
+ """
217
+ visible_with_think, tool_calls = extract_tool_calls(raw_output_with_think)
218
+ if tool_calls:
219
+ logger.debug(f"Detected {len(tool_calls)} tool call(s) in model output.")
220
+
221
+ visible_output = visible_with_think.strip()
222
+
223
+ storage_output, _ = extract_tool_calls(raw_output_clean)
224
+ storage_output = storage_output.strip()
225
+
226
+ if structured_requirement:
227
+ cleaned_for_json = LMDBConversationStore.remove_think_tags(visible_output)
228
+ json_text = strip_code_fence(cleaned_for_json or "")
229
+ if json_text:
230
+ try:
231
+ structured_payload = orjson.loads(json_text)
232
+ canonical_output = orjson.dumps(structured_payload).decode("utf-8")
233
+ visible_output = canonical_output
234
+ storage_output = canonical_output
235
+ logger.debug(
236
+ f"Structured response fulfilled (schema={structured_requirement.schema_name})."
237
+ )
238
+ except orjson.JSONDecodeError:
239
+ logger.warning(
240
+ f"Failed to decode JSON for structured response (schema={structured_requirement.schema_name})."
241
+ )
242
+
243
+ return visible_output, storage_output, tool_calls
244
+
245
+
246
+ def _persist_conversation(
247
+ db: LMDBConversationStore,
248
+ model_name: str,
249
+ client_id: str,
250
+ metadata: list[str | None],
251
+ messages: list[Message],
252
+ storage_output: str | None,
253
+ tool_calls: list[Any] | None,
254
+ ) -> str | None:
255
+ """Unified logic to save conversation history to LMDB."""
256
+ try:
257
+ current_assistant_message = Message(
258
+ role="assistant",
259
+ content=storage_output or None,
260
+ tool_calls=tool_calls or None,
261
+ )
262
+ full_history = [*messages, current_assistant_message]
263
+ cleaned_history = db.sanitize_assistant_messages(full_history)
264
+
265
+ conv = ConversationInStore(
266
+ model=model_name,
267
+ client_id=client_id,
268
+ metadata=metadata,
269
+ messages=cleaned_history,
270
+ )
271
+ key = db.store(conv)
272
+ logger.debug(f"Conversation saved to LMDB with key: {key[:12]}")
273
+ return key
274
+ except Exception as e:
275
+ logger.warning(f"Failed to save {len(messages) + 1} messages to LMDB: {e}")
276
+ return None
277
+
278
+
279
  def _build_structured_requirement(
280
  response_format: dict[str, Any] | None,
281
  ) -> StructuredOutputRequirement | None:
 
284
  return None
285
 
286
  if response_format.get("type") != "json_schema":
287
+ logger.warning(
288
+ f"Unsupported response_format type requested: {reprlib.repr(response_format)}"
289
+ )
290
  return None
291
 
292
  json_schema = response_format.get("json_schema")
293
  if not isinstance(json_schema, dict):
294
+ logger.warning(
295
+ f"Invalid json_schema payload in response_format: {reprlib.repr(response_format)}"
296
+ )
297
  return None
298
 
299
  schema = json_schema.get("schema")
300
  if not isinstance(schema, dict):
301
+ logger.warning(
302
+ f"Missing `schema` object in response_format payload: {reprlib.repr(response_format)}"
303
+ )
304
  return None
305
 
306
  schema_name = json_schema.get("name") or "response"
 
346
  description = function.description or "No description provided."
347
  lines.append(f"Tool `{function.name}`: {description}")
348
  if function.parameters:
349
+ schema_text = orjson.dumps(function.parameters, option=orjson.OPT_SORT_KEYS).decode(
350
+ "utf-8"
351
+ )
352
  lines.append("Arguments JSON schema:")
353
  lines.append(schema_text)
354
  else:
 
367
  lines.append(
368
  f"You are required to call the tool named `{target}`. Do not call any other tool."
369
  )
 
370
 
371
  lines.append(
372
  "When you decide to call a tool you MUST respond with nothing except a single fenced block exactly like the template below."
 
432
 
433
  if isinstance(msg.content, str):
434
  if XML_HINT_STRIPPED not in msg.content:
435
+ msg.content = f"{msg.content}\n{XML_WRAP_HINT}"
436
  return
437
 
438
  if isinstance(msg.content, list):
 
442
  text_value = part.text or ""
443
  if XML_HINT_STRIPPED in text_value:
444
  return
445
+ part.text = f"{text_value}\n{XML_WRAP_HINT}"
446
  return
447
 
448
  messages_text = XML_WRAP_HINT.strip()
449
  msg.content.append(ContentItem(type="text", text=messages_text))
450
  return
451
 
 
 
452
 
453
  def _conversation_has_code_hint(messages: list[Message]) -> bool:
454
  """Return True if any system message already includes the code block hint."""
 
481
  """Return a copy of messages enriched with tool instructions when needed."""
482
  prepared = [msg.model_copy(deep=True) for msg in source_messages]
483
 
484
+ # Resolve tool names for 'tool' messages by looking back at previous assistant tool calls
485
+ tool_id_to_name = {}
486
+ for msg in prepared:
487
+ if msg.role == "assistant" and msg.tool_calls:
488
+ for tc in msg.tool_calls:
489
+ tool_id_to_name[tc.id] = tc.function.name
490
+
491
+ for msg in prepared:
492
+ if msg.role == "tool" and not msg.name and msg.tool_call_id:
493
+ msg.name = tool_id_to_name.get(msg.tool_call_id)
494
+
495
  instructions: list[str] = []
496
  if inject_system_defaults:
497
  if tools:
 
510
  logger.debug("Injected default code block hint for Gemini conversation.")
511
 
512
  if not instructions:
 
513
  if tools and tool_choice != "none":
514
  _append_xml_hint_to_last_user_message(prepared)
515
  return prepared
 
542
  normalized_input: list[ResponseInputItem] = []
543
  for item in items:
544
  role = item.role
 
545
  content = item.content
546
  normalized_contents: list[ResponseInputContent] = []
547
  if isinstance(content, str):
 
612
  continue
613
 
614
  role = item.role
 
615
  content = item.content
616
  if isinstance(content, str):
617
  instruction_messages.append(Message(role=role, content=content))
 
649
 
650
 
651
  def _get_model_by_name(name: str) -> Model:
652
+ """Retrieve a Model instance by name."""
 
 
 
653
  strategy = g_config.gemini.model_strategy
654
  custom_models = {m.model_name: m for m in g_config.gemini.models if m.model_name}
655
 
 
663
 
664
 
665
  def _get_available_models() -> list[ModelData]:
666
+ """Return a list of available models based on configuration strategy."""
 
 
667
  now = int(datetime.now(tz=timezone.utc).timestamp())
668
  strategy = g_config.gemini.model_strategy
669
  models_data = []
 
698
  return models_data
699
 
700
 
701
+ async def _find_reusable_session(
702
+ db: LMDBConversationStore,
703
+ pool: GeminiClientPool,
704
+ model: Model,
705
+ messages: list[Message],
706
+ ) -> tuple[ChatSession | None, GeminiClientWrapper | None, list[Message]]:
707
+ """Find an existing chat session matching the longest suitable history prefix."""
708
+ if len(messages) < 2:
709
+ return None, None, messages
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
710
 
711
+ search_end = len(messages)
712
+ while search_end >= 2:
713
+ search_history = messages[:search_end]
714
+ if search_history[-1].role in {"assistant", "system", "tool"}:
715
+ try:
716
+ if conv := db.find(model.model_name, search_history):
717
+ now = datetime.now()
718
+ updated_at = conv.updated_at or conv.created_at or now
719
+ age_minutes = (now - updated_at).total_seconds() / 60
720
+ if age_minutes <= METADATA_TTL_MINUTES:
721
+ client = await pool.acquire(conv.client_id)
722
+ session = client.start_chat(metadata=conv.metadata, model=model)
723
+ remain = messages[search_end:]
724
+ logger.debug(
725
+ f"Match found at prefix length {search_end}/{len(messages)}. Client: {conv.client_id}"
726
+ )
727
+ return session, client, remain
728
+ else:
729
+ logger.debug(
730
+ f"Matched conversation at length {search_end} is too old ({age_minutes:.1f}m), skipping reuse."
731
+ )
732
+ else:
733
+ # Log that we tried this prefix but failed
734
+ pass
735
+ except Exception as e:
736
+ logger.warning(
737
+ f"Error checking LMDB for reusable session at length {search_end}: {e}"
738
+ )
739
+ break
740
+ search_end -= 1
741
 
742
+ logger.debug(f"No reusable session found for {len(messages)} messages.")
743
+ return None, None, messages
744
 
 
 
 
 
745
 
746
+ async def _send_with_split(
747
+ session: ChatSession,
748
+ text: str,
749
+ files: list[Path | str | io.BytesIO] | None = None,
750
+ stream: bool = False,
751
+ ) -> AsyncGenerator[ModelOutput, None] | ModelOutput:
752
+ """Send text to Gemini, splitting or converting to attachment if too long."""
753
+ if len(text) <= MAX_CHARS_PER_REQUEST:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
754
  try:
755
+ if stream:
756
+ return session.send_message_stream(text, files=files)
757
+ return await session.send_message(text, files=files)
 
 
 
 
 
 
 
 
 
758
  except Exception as e:
759
+ logger.exception(f"Error sending message to Gemini: {e}")
760
  raise
 
761
 
762
+ logger.info(
763
+ f"Message length ({len(text)}) exceeds limit ({MAX_CHARS_PER_REQUEST}). Converting text to file attachment."
764
+ )
765
+ file_obj = io.BytesIO(text.encode("utf-8"))
766
+ file_obj.name = "message.txt"
767
  try:
768
+ final_files = list(files) if files else []
769
+ final_files.append(file_obj)
770
+ instruction = (
771
+ "The user's input exceeds the character limit and is provided in the attached file `message.txt`.\n\n"
772
+ "**System Instruction:**\n"
773
+ "1. Read the content of `message.txt`.\n"
774
+ "2. Treat that content as the **primary** user prompt for this turn.\n"
775
+ "3. Execute the instructions or answer the questions found *inside* that file immediately.\n"
776
  )
777
+ if stream:
778
+ return session.send_message_stream(instruction, files=final_files)
779
+ return await session.send_message(instruction, files=final_files)
 
 
 
 
 
 
 
780
  except Exception as e:
781
+ logger.exception(f"Error sending large text as file to Gemini: {e}")
782
+ raise
 
 
 
783
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
784
 
785
+ class StreamingOutputFilter:
786
+ """
787
+ Enhanced streaming filter that suppresses:
788
+ 1. XML tool call blocks: ```xml ... ```
789
+ 2. ChatML tool blocks: <|im_start|>tool\n...<|im_end|>
790
+ 3. ChatML role headers: <|im_start|>role\n (only suppresses the header, keeps content)
791
+ 4. Control tokens: <|im_start|>, <|im_end|>
792
+ 5. System instructions/hints: XML_WRAP_HINT, CODE_BLOCK_HINT, etc.
793
+ """
794
 
795
+ def __init__(self):
796
+ self.buffer = ""
797
+ self.in_xml_tool = False
798
+ self.in_tagged_block = False
799
+ self.in_role_header = False
800
+ self.current_role = ""
801
+
802
+ self.XML_START = "```xml"
803
+ self.XML_END = "```"
804
+ self.TAG_START = "<|im_start|>"
805
+ self.TAG_END = "<|im_end|>"
806
+ self.SYSTEM_HINTS = [
807
+ XML_WRAP_HINT,
808
+ XML_HINT_STRIPPED,
809
+ CODE_BLOCK_HINT,
810
+ CODE_HINT_STRIPPED,
811
+ ]
 
812
 
813
+ def process(self, chunk: str) -> str:
814
+ self.buffer += chunk
815
+ to_yield = ""
816
+
817
+ while self.buffer:
818
+ if self.in_xml_tool:
819
+ end_idx = self.buffer.find(self.XML_END)
820
+ if end_idx != -1:
821
+ self.buffer = self.buffer[end_idx + len(self.XML_END) :]
822
+ self.in_xml_tool = False
823
+ else:
824
+ break
825
+ elif self.in_role_header:
826
+ nl_idx = self.buffer.find("\n")
827
+ if nl_idx != -1:
828
+ role_text = self.buffer[:nl_idx].strip().lower()
829
+ self.current_role = role_text
830
+ self.buffer = self.buffer[nl_idx + 1 :]
831
+ self.in_role_header = False
832
+ self.in_tagged_block = True
833
+ else:
834
+ break
835
+ elif self.in_tagged_block:
836
+ end_idx = self.buffer.find(self.TAG_END)
837
+ if end_idx != -1:
838
+ content = self.buffer[:end_idx]
839
+ if self.current_role != "tool":
840
+ to_yield += content
841
+ self.buffer = self.buffer[end_idx + len(self.TAG_END) :]
842
+ self.in_tagged_block = False
843
+ self.current_role = ""
844
+ else:
845
+ if self.current_role == "tool":
846
+ break
847
+ else:
848
+ yield_len = len(self.buffer) - (len(self.TAG_END) - 1)
849
+ if yield_len > 0:
850
+ to_yield += self.buffer[:yield_len]
851
+ self.buffer = self.buffer[yield_len:]
852
+ break
853
+ else:
854
+ # Outside any special block. Look for starts.
855
+ earliest_idx = -1
856
+ match_type = ""
857
+
858
+ xml_idx = self.buffer.find(self.XML_START)
859
+ if xml_idx != -1:
860
+ earliest_idx = xml_idx
861
+ match_type = "xml"
862
+
863
+ tag_s_idx = self.buffer.find(self.TAG_START)
864
+ if tag_s_idx != -1:
865
+ if earliest_idx == -1 or tag_s_idx < earliest_idx:
866
+ earliest_idx = tag_s_idx
867
+ match_type = "tag_start"
868
+
869
+ tag_e_idx = self.buffer.find(self.TAG_END)
870
+ if tag_e_idx != -1:
871
+ if earliest_idx == -1 or tag_e_idx < earliest_idx:
872
+ earliest_idx = tag_e_idx
873
+ match_type = "tag_end"
874
+
875
+ if earliest_idx != -1:
876
+ # Yield text before the match
877
+ to_yield += self.buffer[:earliest_idx]
878
+ self.buffer = self.buffer[earliest_idx:]
879
+
880
+ if match_type == "xml":
881
+ self.in_xml_tool = True
882
+ self.buffer = self.buffer[len(self.XML_START) :]
883
+ elif match_type == "tag_start":
884
+ self.in_role_header = True
885
+ self.buffer = self.buffer[len(self.TAG_START) :]
886
+ elif match_type == "tag_end":
887
+ # Orphaned end tag, just skip it
888
+ self.buffer = self.buffer[len(self.TAG_END) :]
889
+ continue
890
+ else:
891
+ # Check for prefixes
892
+ prefixes = [self.XML_START, self.TAG_START, self.TAG_END]
893
+ max_keep = 0
894
+ for p in prefixes:
895
+ for i in range(len(p) - 1, 0, -1):
896
+ if self.buffer.endswith(p[:i]):
897
+ max_keep = max(max_keep, i)
898
+ break
899
 
900
+ yield_len = len(self.buffer) - max_keep
901
+ if yield_len > 0:
902
+ to_yield += self.buffer[:yield_len]
903
+ self.buffer = self.buffer[yield_len:]
904
+ break
905
 
906
+ # Final pass: filter out system hints from the text to be yielded
907
+ for hint in self.SYSTEM_HINTS:
908
+ if hint in to_yield:
909
+ to_yield = to_yield.replace(hint, "")
 
 
 
 
 
 
910
 
911
+ return to_yield
 
 
 
 
 
 
 
 
 
 
912
 
913
+ def flush(self) -> str:
914
+ # If we are stuck in a tool block or role header at the end,
915
+ # it usually means malformed output.
916
+ if self.in_xml_tool or (self.in_tagged_block and self.current_role == "tool"):
917
+ return ""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
918
 
919
+ final_text = self.buffer
920
+ self.buffer = ""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
921
 
922
+ # Filter out any orphaned/partial control tokens or hints
923
+ final_text = CONTROL_TOKEN_RE.sub("", final_text)
924
+ for hint in self.SYSTEM_HINTS:
925
+ final_text = final_text.replace(hint, "")
 
 
 
 
926
 
927
+ return final_text.strip()
 
 
 
 
 
 
 
 
 
928
 
 
 
 
 
 
929
 
930
+ # --- Response Builders & Streaming ---
931
 
932
 
933
+ def _create_real_streaming_response(
934
+ generator: AsyncGenerator[ModelOutput, None],
935
+ completion_id: str,
936
+ created_time: int,
937
+ model_name: str,
938
+ messages: list[Message],
939
  db: LMDBConversationStore,
 
940
  model: Model,
941
+ client_wrapper: GeminiClientWrapper,
942
+ session: ChatSession,
943
+ base_url: str,
944
+ structured_requirement: StructuredOutputRequirement | None = None,
945
+ ) -> StreamingResponse:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
946
  """
947
+ Create a real-time streaming response.
948
+ Reconciles manual delta accumulation with the model's final authoritative state.
949
  """
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
950
 
951
+ async def generate_stream():
952
+ full_thoughts, full_text = "", ""
953
+ has_started = False
954
+ last_chunk_was_thought = False
955
+ all_outputs: list[ModelOutput] = []
956
+ suppressor = StreamingOutputFilter()
957
  try:
958
+ async for chunk in generator:
959
+ all_outputs.append(chunk)
960
+ if not has_started:
961
+ data = {
962
+ "id": completion_id,
963
+ "object": "chat.completion.chunk",
964
+ "created": created_time,
965
+ "model": model_name,
966
+ "choices": [
967
+ {"index": 0, "delta": {"role": "assistant"}, "finish_reason": None}
968
+ ],
969
+ }
970
+ yield f"data: {orjson.dumps(data).decode('utf-8')}\n\n"
971
+ has_started = True
972
+
973
+ if t_delta := chunk.thoughts_delta:
974
+ if not last_chunk_was_thought and not full_thoughts:
975
+ yield f"data: {orjson.dumps({'id': completion_id, 'object': 'chat.completion.chunk', 'created': created_time, 'model': model_name, 'choices': [{'index': 0, 'delta': {'content': '<think>'}, 'finish_reason': None}]}).decode('utf-8')}\n\n"
976
+ full_thoughts += t_delta
977
+ data = {
978
+ "id": completion_id,
979
+ "object": "chat.completion.chunk",
980
+ "created": created_time,
981
+ "model": model_name,
982
+ "choices": [
983
+ {"index": 0, "delta": {"content": t_delta}, "finish_reason": None}
984
+ ],
985
+ }
986
+ yield f"data: {orjson.dumps(data).decode('utf-8')}\n\n"
987
+ last_chunk_was_thought = True
988
+
989
+ if text_delta := chunk.text_delta:
990
+ if last_chunk_was_thought:
991
+ yield f"data: {orjson.dumps({'id': completion_id, 'object': 'chat.completion.chunk', 'created': created_time, 'model': model_name, 'choices': [{'index': 0, 'delta': {'content': '</think>\n'}, 'finish_reason': None}]}).decode('utf-8')}\n\n"
992
+ last_chunk_was_thought = False
993
+ full_text += text_delta
994
+ if visible_delta := suppressor.process(text_delta):
995
+ data = {
996
+ "id": completion_id,
997
+ "object": "chat.completion.chunk",
998
+ "created": created_time,
999
+ "model": model_name,
1000
+ "choices": [
1001
+ {
1002
+ "index": 0,
1003
+ "delta": {"content": visible_delta},
1004
+ "finish_reason": None,
1005
+ }
1006
+ ],
1007
+ }
1008
+ yield f"data: {orjson.dumps(data).decode('utf-8')}\n\n"
1009
  except Exception as e:
1010
+ logger.exception(f"Error during OpenAI streaming: {e}")
1011
+ yield f"data: {orjson.dumps({'error': {'message': 'Streaming error occurred.', 'type': 'server_error', 'param': None, 'code': None}}).decode('utf-8')}\n\n"
1012
+ return
1013
 
1014
+ if all_outputs:
1015
+ final_chunk = all_outputs[-1]
1016
+ if final_chunk.text:
1017
+ full_text = final_chunk.text
1018
+ if final_chunk.thoughts:
1019
+ full_thoughts = final_chunk.thoughts
1020
 
1021
+ if last_chunk_was_thought:
1022
+ yield f"data: {orjson.dumps({'id': completion_id, 'object': 'chat.completion.chunk', 'created': created_time, 'model': model_name, 'choices': [{'index': 0, 'delta': {'content': '</think>\n'}, 'finish_reason': None}]}).decode('utf-8')}\n\n"
 
 
 
 
 
 
 
1023
 
1024
+ if remaining_text := suppressor.flush():
1025
+ data = {
1026
+ "id": completion_id,
1027
+ "object": "chat.completion.chunk",
1028
+ "created": created_time,
1029
+ "model": model_name,
1030
+ "choices": [
1031
+ {"index": 0, "delta": {"content": remaining_text}, "finish_reason": None}
1032
+ ],
1033
+ }
1034
+ yield f"data: {orjson.dumps(data).decode('utf-8')}\n\n"
1035
 
1036
+ raw_output_with_think = f"<think>{full_thoughts}</think>\n" if full_thoughts else ""
1037
+ raw_output_with_think += full_text
1038
+ assistant_text, storage_output, tool_calls = _process_llm_output(
1039
+ raw_output_with_think, full_text, structured_requirement
1040
+ )
1041
+
1042
+ images = []
1043
+ seen_urls = set()
1044
+ for out in all_outputs:
1045
+ if out.images:
1046
+ for img in out.images:
1047
+ # Use the image URL as a stable identifier across chunks
1048
+ if img.url not in seen_urls:
1049
+ images.append(img)
1050
+ seen_urls.add(img.url)
1051
+
1052
+ image_markdown = ""
1053
+ seen_hashes = set()
1054
+ for image in images:
1055
+ try:
1056
+ image_store = get_image_store_dir()
1057
+ _, _, _, filename, file_hash = await _image_to_base64(image, image_store)
1058
+ if file_hash in seen_hashes:
1059
+ # Duplicate content, delete the file and skip
1060
+ (image_store / filename).unlink(missing_ok=True)
1061
+ continue
1062
+ seen_hashes.add(file_hash)
1063
 
1064
+ img_url = (
1065
+ f"![{filename}]({base_url}images/{filename}?token={get_image_token(filename)})"
1066
+ )
1067
+ image_markdown += f"\n\n{img_url}"
1068
+ except Exception as exc:
1069
+ logger.warning(f"Failed to process image in OpenAI stream: {exc}")
1070
+
1071
+ if image_markdown:
1072
+ assistant_text += image_markdown
1073
+ storage_output += image_markdown
1074
+ # Send the image Markdown as a final text chunk before usage
1075
  data = {
1076
  "id": completion_id,
1077
  "object": "chat.completion.chunk",
1078
  "created": created_time,
1079
+ "model": model_name,
1080
+ "choices": [
1081
+ {"index": 0, "delta": {"content": image_markdown}, "finish_reason": None}
1082
+ ],
1083
  }
1084
  yield f"data: {orjson.dumps(data).decode('utf-8')}\n\n"
1085
 
1086
+ tool_calls_payload = [call.model_dump(mode="json") for call in tool_calls]
1087
+ if tool_calls_payload:
1088
+ tool_calls_delta = [
1089
+ {**call, "index": idx} for idx, call in enumerate(tool_calls_payload)
1090
+ ]
1091
  data = {
1092
  "id": completion_id,
1093
  "object": "chat.completion.chunk",
1094
  "created": created_time,
1095
+ "model": model_name,
1096
  "choices": [
1097
+ {"index": 0, "delta": {"tool_calls": tool_calls_delta}, "finish_reason": None}
 
 
 
 
1098
  ],
1099
  }
1100
  yield f"data: {orjson.dumps(data).decode('utf-8')}\n\n"
1101
 
1102
+ p_tok, c_tok, t_tok = _calculate_usage(messages, assistant_text, tool_calls)
1103
+ usage = {"prompt_tokens": p_tok, "completion_tokens": c_tok, "total_tokens": t_tok}
1104
  data = {
1105
  "id": completion_id,
1106
  "object": "chat.completion.chunk",
1107
  "created": created_time,
1108
+ "model": model_name,
1109
+ "choices": [
1110
+ {"index": 0, "delta": {}, "finish_reason": "tool_calls" if tool_calls else "stop"}
1111
+ ],
1112
+ "usage": usage,
 
 
1113
  }
1114
+ _persist_conversation(
1115
+ db,
1116
+ model.model_name,
1117
+ client_wrapper.id,
1118
+ session.metadata,
1119
+ messages, # This should be the prepared messages
1120
+ storage_output,
1121
+ tool_calls,
1122
+ )
1123
  yield f"data: {orjson.dumps(data).decode('utf-8')}\n\n"
1124
  yield "data: [DONE]\n\n"
1125
 
1126
  return StreamingResponse(generate_stream(), media_type="text/event-stream")
1127
 
1128
 
1129
+ def _create_responses_real_streaming_response(
1130
+ generator: AsyncGenerator[ModelOutput, None],
1131
+ response_id: str,
1132
+ created_time: int,
1133
+ model_name: str,
1134
+ messages: list[Message],
1135
+ db: LMDBConversationStore,
1136
+ model: Model,
1137
+ client_wrapper: GeminiClientWrapper,
1138
+ session: ChatSession,
1139
+ request: ResponseCreateRequest,
1140
+ image_store: Path,
1141
+ base_url: str,
1142
+ structured_requirement: StructuredOutputRequirement | None = None,
1143
  ) -> StreamingResponse:
1144
+ """
1145
+ Create a real-time streaming response for the Responses API.
1146
+ Ensures final accumulated text and thoughts are synchronized.
1147
+ """
 
 
 
 
 
 
 
1148
  base_event = {
1149
  "id": response_id,
1150
  "object": "response",
1151
  "created_at": created_time,
1152
+ "model": model_name,
 
 
 
 
 
 
 
 
1153
  }
 
 
 
 
 
 
 
 
1154
 
1155
  async def generate_stream():
1156
+ yield f"data: {orjson.dumps({**base_event, 'type': 'response.created', 'response': {'id': response_id, 'object': 'response', 'created_at': created_time, 'model': model_name, 'status': 'in_progress', 'metadata': request.metadata, 'input': None, 'tools': request.tools, 'tool_choice': request.tool_choice}}).decode('utf-8')}\n\n"
1157
+ message_id = f"msg_{uuid.uuid4().hex}"
1158
+ yield f"data: {orjson.dumps({**base_event, 'type': 'response.output_item.added', 'output_index': 0, 'item': {'id': message_id, 'type': 'message', 'role': 'assistant', 'content': []}}).decode('utf-8')}\n\n"
 
 
 
 
1159
 
1160
+ full_thoughts, full_text = "", ""
1161
+ last_chunk_was_thought = False
1162
+ all_outputs: list[ModelOutput] = []
1163
+ suppressor = StreamingOutputFilter()
1164
 
1165
+ try:
1166
+ async for chunk in generator:
1167
+ all_outputs.append(chunk)
1168
+ if t_delta := chunk.thoughts_delta:
1169
+ if not last_chunk_was_thought and not full_thoughts:
1170
+ yield f"data: {orjson.dumps({**base_event, 'type': 'response.output_text.delta', 'output_index': 0, 'delta': '<think>'}).decode('utf-8')}\n\n"
1171
+ full_thoughts += t_delta
1172
+ yield f"data: {orjson.dumps({**base_event, 'type': 'response.output_text.delta', 'output_index': 0, 'delta': t_delta}).decode('utf-8')}\n\n"
1173
+ last_chunk_was_thought = True
1174
+ if text_delta := chunk.text_delta:
1175
+ if last_chunk_was_thought:
1176
+ yield f"data: {orjson.dumps({**base_event, 'type': 'response.output_text.delta', 'output_index': 0, 'delta': '</think>\n'}).decode('utf-8')}\n\n"
1177
+ last_chunk_was_thought = False
1178
+ full_text += text_delta
1179
+ if visible_delta := suppressor.process(text_delta):
1180
+ yield f"data: {orjson.dumps({**base_event, 'type': 'response.output_text.delta', 'output_index': 0, 'delta': visible_delta}).decode('utf-8')}\n\n"
1181
+ except Exception as e:
1182
+ logger.exception(f"Error during Responses API streaming: {e}")
1183
+ yield f"data: {orjson.dumps({**base_event, 'type': 'error', 'error': {'message': 'Streaming error.'}}).decode('utf-8')}\n\n"
1184
+ return
 
 
 
 
 
1185
 
1186
+ if all_outputs:
1187
+ final_chunk = all_outputs[-1]
1188
+ if final_chunk.text:
1189
+ full_text = final_chunk.text
1190
+ if final_chunk.thoughts:
1191
+ full_thoughts = final_chunk.thoughts
1192
+
1193
+ if last_chunk_was_thought:
1194
+ yield f"data: {orjson.dumps({**base_event, 'type': 'response.output_text.delta', 'output_index': 0, 'delta': '</think>\n'}).decode('utf-8')}\n\n"
1195
+ if remaining_text := suppressor.flush():
1196
+ yield f"data: {orjson.dumps({**base_event, 'type': 'response.output_text.delta', 'output_index': 0, 'delta': remaining_text}).decode('utf-8')}\n\n"
1197
+ yield f"data: {orjson.dumps({**base_event, 'type': 'response.output_text.done', 'output_index': 0}).decode('utf-8')}\n\n"
1198
+
1199
+ raw_output_with_think = f"<think>{full_thoughts}</think>\n" if full_thoughts else ""
1200
+ raw_output_with_think += full_text
1201
+ assistant_text, storage_output, detected_tool_calls = _process_llm_output(
1202
+ raw_output_with_think, full_text, structured_requirement
1203
+ )
1204
 
1205
+ images = []
1206
+ seen_urls = set()
1207
+ for out in all_outputs:
1208
+ if out.images:
1209
+ for img in out.images:
1210
+ if img.url not in seen_urls:
1211
+ images.append(img)
1212
+ seen_urls.add(img.url)
1213
+
1214
+ response_contents, image_call_items = [], []
1215
+ seen_hashes = set()
1216
+ for image in images:
1217
+ try:
1218
+ image_base64, width, height, filename, file_hash = await _image_to_base64(
1219
+ image, image_store
1220
+ )
1221
+ if file_hash in seen_hashes:
1222
+ (image_store / filename).unlink(missing_ok=True)
1223
+ continue
1224
+ seen_hashes.add(file_hash)
1225
+
1226
+ img_format = "png" if isinstance(image, GeneratedImage) else "jpeg"
1227
+ image_url = (
1228
+ f"![{filename}]({base_url}images/{filename}?token={get_image_token(filename)})"
1229
+ )
1230
+ image_call_items.append(
1231
+ ResponseImageGenerationCall(
1232
+ id=filename.rsplit(".", 1)[0],
1233
+ result=image_base64,
1234
+ output_format=img_format,
1235
+ size=f"{width}x{height}" if width and height else None,
1236
+ )
1237
+ )
1238
+ response_contents.append(ResponseOutputContent(type="output_text", text=image_url))
1239
+ except Exception as exc:
1240
+ logger.warning(f"Failed to process image in stream: {exc}")
1241
+
1242
+ if assistant_text:
1243
+ response_contents.append(ResponseOutputContent(type="output_text", text=assistant_text))
1244
+ if not response_contents:
1245
+ response_contents.append(ResponseOutputContent(type="output_text", text=""))
1246
+
1247
+ # Aggregate images for storage
1248
+ image_markdown = ""
1249
+ for img_call in image_call_items:
1250
+ fname = f"{img_call.id}.{img_call.output_format}"
1251
+ img_url = f"![{fname}]({base_url}images/{fname}?token={get_image_token(fname)})"
1252
+ image_markdown += f"\n\n{img_url}"
1253
+
1254
+ if image_markdown:
1255
+ storage_output += image_markdown
1256
+
1257
+ yield f"data: {orjson.dumps({**base_event, 'type': 'response.output_item.done', 'output_index': 0, 'item': {'id': message_id, 'type': 'message', 'role': 'assistant', 'content': [c.model_dump(mode='json') for c in response_contents]}}).decode('utf-8')}\n\n"
1258
+
1259
+ current_idx = 1
1260
+ for call in detected_tool_calls:
1261
+ tc_item = ResponseToolCall(id=call.id, status="completed", function=call.function)
1262
+ yield f"data: {orjson.dumps({**base_event, 'type': 'response.output_item.added', 'output_index': current_idx, 'item': tc_item.model_dump(mode='json')}).decode('utf-8')}\n\n"
1263
+ yield f"data: {orjson.dumps({**base_event, 'type': 'response.output_item.done', 'output_index': current_idx, 'item': tc_item.model_dump(mode='json')}).decode('utf-8')}\n\n"
1264
+ current_idx += 1
1265
+ for img_call in image_call_items:
1266
+ yield f"data: {orjson.dumps({**base_event, 'type': 'response.output_item.added', 'output_index': current_idx, 'item': img_call.model_dump(mode='json')}).decode('utf-8')}\n\n"
1267
+ yield f"data: {orjson.dumps({**base_event, 'type': 'response.output_item.done', 'output_index': current_idx, 'item': img_call.model_dump(mode='json')}).decode('utf-8')}\n\n"
1268
+ current_idx += 1
1269
+
1270
+ p_tok, c_tok, t_tok = _calculate_usage(messages, assistant_text, detected_tool_calls)
1271
+ usage = ResponseUsage(input_tokens=p_tok, output_tokens=c_tok, total_tokens=t_tok)
1272
+ payload = _create_responses_standard_payload(
1273
+ response_id,
1274
+ created_time,
1275
+ model_name,
1276
+ detected_tool_calls,
1277
+ image_call_items,
1278
+ response_contents,
1279
+ usage,
1280
+ request,
1281
+ None,
1282
+ )
1283
+ _persist_conversation(
1284
+ db,
1285
+ model.model_name,
1286
+ client_wrapper.id,
1287
+ session.metadata,
1288
+ messages,
1289
+ storage_output,
1290
+ detected_tool_calls,
1291
+ )
1292
+ yield f"data: {orjson.dumps({**base_event, 'type': 'response.completed', 'response': payload.model_dump(mode='json')}).decode('utf-8')}\n\n"
1293
  yield "data: [DONE]\n\n"
1294
 
1295
  return StreamingResponse(generate_stream(), media_type="text/event-stream")
1296
 
1297
 
1298
+ # --- Main Router Endpoints ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1299
 
 
 
 
1300
 
1301
+ @router.get("/v1/models", response_model=ModelListResponse)
1302
+ async def list_models(api_key: str = Depends(verify_api_key)):
1303
+ models = _get_available_models()
1304
+ return ModelListResponse(data=models)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1305
 
 
 
1306
 
1307
+ @router.post("/v1/chat/completions")
1308
+ async def create_chat_completion(
1309
+ request: ChatCompletionRequest,
1310
+ raw_request: Request,
1311
+ api_key: str = Depends(verify_api_key),
1312
+ tmp_dir: Path = Depends(get_temp_dir),
1313
+ image_store: Path = Depends(get_image_store_dir),
1314
+ ):
1315
+ base_url = str(raw_request.base_url)
1316
+ pool, db = GeminiClientPool(), LMDBConversationStore()
1317
+ try:
1318
+ model = _get_model_by_name(request.model)
1319
+ except ValueError as exc:
1320
+ raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=str(exc)) from exc
1321
+ if not request.messages:
1322
+ raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="Messages required.")
1323
 
1324
+ structured_requirement = _build_structured_requirement(request.response_format)
1325
+ extra_instr = [structured_requirement.instruction] if structured_requirement else None
1326
+
1327
+ # This ensures that server-injected system instructions are part of the history
1328
+ msgs = _prepare_messages_for_model(
1329
+ request.messages, request.tools, request.tool_choice, extra_instr
1330
+ )
1331
+
1332
+ session, client, remain = await _find_reusable_session(db, pool, model, msgs)
1333
+
1334
+ if session:
1335
+ if not remain:
1336
+ raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="No new messages.")
1337
+
1338
+ # For reused sessions, we only need to process the remaining messages.
1339
+ # We don't re-inject system defaults to avoid duplicating instructions already in history.
1340
+ input_msgs = _prepare_messages_for_model(
1341
+ remain, request.tools, request.tool_choice, extra_instr, False
1342
+ )
1343
+ if len(input_msgs) == 1:
1344
+ m_input, files = await GeminiClientWrapper.process_message(
1345
+ input_msgs[0], tmp_dir, tagged=False
1346
+ )
1347
+ else:
1348
+ m_input, files = await GeminiClientWrapper.process_conversation(input_msgs, tmp_dir)
1349
+
1350
+ logger.debug(
1351
+ f"Reused session {reprlib.repr(session.metadata)} - sending {len(input_msgs)} prepared messages."
1352
+ )
1353
+ else:
1354
  try:
1355
+ client = await pool.acquire()
1356
+ session = client.start_chat(model=model)
1357
+ # Use the already prepared 'msgs' for a fresh session
1358
+ m_input, files = await GeminiClientWrapper.process_conversation(msgs, tmp_dir)
1359
  except Exception as e:
1360
+ logger.exception("Error in preparing conversation")
1361
+ raise HTTPException(status_code=status.HTTP_503_SERVICE_UNAVAILABLE, detail=str(e))
1362
+
1363
+ completion_id = f"chatcmpl-{uuid.uuid4()}"
1364
+ created_time = int(datetime.now(tz=timezone.utc).timestamp())
1365
+
1366
+ try:
1367
+ assert session and client
1368
+ logger.debug(
1369
+ f"Client ID: {client.id}, Input length: {len(m_input)}, files count: {len(files)}"
1370
+ )
1371
+ resp_or_stream = await _send_with_split(
1372
+ session, m_input, files=files, stream=request.stream
1373
+ )
1374
+ except Exception as e:
1375
+ logger.exception("Gemini API error")
1376
+ raise HTTPException(status_code=status.HTTP_502_BAD_GATEWAY, detail=str(e))
1377
+
1378
+ if request.stream:
1379
+ return _create_real_streaming_response(
1380
+ resp_or_stream,
1381
+ completion_id,
1382
+ created_time,
1383
+ request.model,
1384
+ msgs, # Use prepared 'msgs'
1385
+ db,
1386
+ model,
1387
+ client,
1388
+ session,
1389
+ base_url,
1390
+ structured_requirement,
1391
+ )
1392
+
1393
+ try:
1394
+ raw_with_t = GeminiClientWrapper.extract_output(resp_or_stream, include_thoughts=True)
1395
+ raw_clean = GeminiClientWrapper.extract_output(resp_or_stream, include_thoughts=False)
1396
+ except Exception as exc:
1397
+ logger.exception("Gemini output parsing failed.")
1398
+ raise HTTPException(
1399
+ status_code=status.HTTP_502_BAD_GATEWAY, detail="Malformed response."
1400
+ ) from exc
1401
+
1402
+ visible_output, storage_output, tool_calls = _process_llm_output(
1403
+ raw_with_t, raw_clean, structured_requirement
1404
+ )
1405
+
1406
+ # Process images for OpenAI non-streaming flow
1407
+ images = resp_or_stream.images or []
1408
+ image_markdown = ""
1409
+ seen_hashes = set()
1410
+ for image in images:
1411
+ try:
1412
+ _, _, _, filename, file_hash = await _image_to_base64(image, image_store)
1413
+ if file_hash in seen_hashes:
1414
+ (image_store / filename).unlink(missing_ok=True)
1415
+ continue
1416
+ seen_hashes.add(file_hash)
1417
+
1418
+ img_url = (
1419
+ f"![{filename}]({base_url}images/{filename}?token={get_image_token(filename)})"
1420
  )
1421
+ image_markdown += f"\n\n{img_url}"
1422
+ except Exception as exc:
1423
+ logger.warning(f"Failed to process image in OpenAI response: {exc}")
1424
+
1425
+ if image_markdown:
1426
+ visible_output += image_markdown
1427
+ storage_output += image_markdown
1428
+
1429
+ tool_calls_payload = [call.model_dump(mode="json") for call in tool_calls]
1430
+ if tool_calls_payload:
1431
+ logger.debug(f"Detected tool calls: {reprlib.repr(tool_calls_payload)}")
1432
+
1433
+ p_tok, c_tok, t_tok = _calculate_usage(request.messages, visible_output, tool_calls)
1434
+ usage = {"prompt_tokens": p_tok, "completion_tokens": c_tok, "total_tokens": t_tok}
1435
+ payload = _create_chat_completion_standard_payload(
1436
+ completion_id,
1437
+ created_time,
1438
+ request.model,
1439
+ visible_output,
1440
+ tool_calls_payload,
1441
+ "tool_calls" if tool_calls else "stop",
1442
+ usage,
1443
+ )
1444
+ _persist_conversation(
1445
+ db,
1446
+ model.model_name,
1447
+ client.id,
1448
+ session.metadata,
1449
+ msgs, # Use prepared messages 'msgs'
1450
+ storage_output,
1451
+ tool_calls,
1452
+ )
1453
+ return payload
1454
+
1455
+
1456
+ @router.post("/v1/responses")
1457
+ async def create_response(
1458
+ request: ResponseCreateRequest,
1459
+ raw_request: Request,
1460
+ api_key: str = Depends(verify_api_key),
1461
+ tmp_dir: Path = Depends(get_temp_dir),
1462
+ image_store: Path = Depends(get_image_store_dir),
1463
+ ):
1464
+ base_url = str(raw_request.base_url)
1465
+ base_messages, norm_input = _response_items_to_messages(request.input)
1466
+ struct_req = _build_structured_requirement(request.response_format)
1467
+ extra_instr = [struct_req.instruction] if struct_req else []
1468
+
1469
+ standard_tools, image_tools = [], []
1470
+ if request.tools:
1471
+ for t in request.tools:
1472
+ if isinstance(t, Tool):
1473
+ standard_tools.append(t)
1474
+ elif isinstance(t, ResponseImageTool):
1475
+ image_tools.append(t)
1476
+ elif isinstance(t, dict):
1477
+ if t.get("type") == "function":
1478
+ standard_tools.append(Tool.model_validate(t))
1479
+ elif t.get("type") == "image_generation":
1480
+ image_tools.append(ResponseImageTool.model_validate(t))
1481
+
1482
+ img_instr = _build_image_generation_instruction(
1483
+ image_tools,
1484
+ request.tool_choice if isinstance(request.tool_choice, ResponseToolChoice) else None,
1485
+ )
1486
+ if img_instr:
1487
+ extra_instr.append(img_instr)
1488
+ preface = _instructions_to_messages(request.instructions)
1489
+ conv_messages = [*preface, *base_messages] if preface else base_messages
1490
+ model_tool_choice = (
1491
+ request.tool_choice if isinstance(request.tool_choice, (str, ToolChoiceFunction)) else None
1492
+ )
1493
+
1494
+ messages = _prepare_messages_for_model(
1495
+ conv_messages, standard_tools or None, model_tool_choice, extra_instr or None
1496
+ )
1497
+ pool, db = GeminiClientPool(), LMDBConversationStore()
1498
+ try:
1499
+ model = _get_model_by_name(request.model)
1500
+ except ValueError as exc:
1501
+ raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=str(exc)) from exc
1502
+
1503
+ session, client, remain = await _find_reusable_session(db, pool, model, messages)
1504
+ if session:
1505
+ msgs = _prepare_messages_for_model(remain, request.tools, request.tool_choice, None, False)
1506
+ if not msgs:
1507
+ raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="No new messages.")
1508
+ m_input, files = (
1509
+ await GeminiClientWrapper.process_message(msgs[0], tmp_dir, tagged=False)
1510
+ if len(msgs) == 1
1511
+ else await GeminiClientWrapper.process_conversation(msgs, tmp_dir)
1512
+ )
1513
+ logger.debug(
1514
+ f"Reused session {reprlib.repr(session.metadata)} - sending {len(msgs)} prepared messages."
1515
+ )
1516
  else:
1517
+ try:
1518
+ client = await pool.acquire()
1519
+ session = client.start_chat(model=model)
1520
+ m_input, files = await GeminiClientWrapper.process_conversation(messages, tmp_dir)
1521
+ except Exception as e:
1522
+ logger.exception("Error in preparing conversation")
1523
+ raise HTTPException(status_code=status.HTTP_503_SERVICE_UNAVAILABLE, detail=str(e))
1524
 
1525
+ response_id = f"resp_{uuid.uuid4().hex}"
1526
+ created_time = int(datetime.now(tz=timezone.utc).timestamp())
1527
 
1528
+ try:
1529
+ assert session and client
1530
+ logger.debug(
1531
+ f"Client ID: {client.id}, Input length: {len(m_input)}, files count: {len(files)}"
1532
+ )
1533
+ resp_or_stream = await _send_with_split(
1534
+ session, m_input, files=files, stream=request.stream
1535
+ )
1536
+ except Exception as e:
1537
+ logger.exception("Gemini API error")
1538
+ raise HTTPException(status_code=status.HTTP_502_BAD_GATEWAY, detail=str(e))
1539
 
1540
+ if request.stream:
1541
+ return _create_responses_real_streaming_response(
1542
+ resp_or_stream,
1543
+ response_id,
1544
+ created_time,
1545
+ request.model,
1546
+ messages,
1547
+ db,
1548
+ model,
1549
+ client,
1550
+ session,
1551
+ request,
1552
+ image_store,
1553
+ base_url,
1554
+ struct_req,
1555
+ )
1556
+
1557
+ try:
1558
+ raw_t = GeminiClientWrapper.extract_output(resp_or_stream, include_thoughts=True)
1559
+ raw_c = GeminiClientWrapper.extract_output(resp_or_stream, include_thoughts=False)
1560
+ except Exception as exc:
1561
+ logger.exception("Gemini parsing failed")
1562
+ raise HTTPException(
1563
+ status_code=status.HTTP_502_BAD_GATEWAY, detail="Malformed response."
1564
+ ) from exc
1565
+
1566
+ assistant_text, storage_output, tool_calls = _process_llm_output(raw_t, raw_c, struct_req)
1567
+ images = resp_or_stream.images or []
1568
+ if (
1569
+ request.tool_choice is not None and request.tool_choice.type == "image_generation"
1570
+ ) and not images:
1571
+ raise HTTPException(status_code=status.HTTP_502_BAD_GATEWAY, detail="No images returned.")
1572
+
1573
+ contents, img_calls = [], []
1574
+ seen_hashes = set()
1575
+ for img in images:
1576
+ try:
1577
+ b64, w, h, fname, fhash = await _image_to_base64(img, image_store)
1578
+ if fhash in seen_hashes:
1579
+ (image_store / fname).unlink(missing_ok=True)
1580
+ continue
1581
+ seen_hashes.add(fhash)
1582
+
1583
+ contents.append(
1584
+ ResponseOutputContent(
1585
+ type="output_text",
1586
+ text=f"![{fname}]({base_url}images/{fname}?token={get_image_token(fname)})",
1587
+ )
1588
+ )
1589
+ img_calls.append(
1590
+ ResponseImageGenerationCall(
1591
+ id=fname.rsplit(".", 1)[0],
1592
+ result=b64,
1593
+ output_format="png" if isinstance(img, GeneratedImage) else "jpeg",
1594
+ size=f"{w}x{h}" if w and h else None,
1595
+ )
1596
+ )
1597
+ except Exception as e:
1598
+ logger.warning(f"Image error: {e}")
1599
+
1600
+ if assistant_text:
1601
+ contents.append(ResponseOutputContent(type="output_text", text=assistant_text))
1602
+ if not contents:
1603
+ contents.append(ResponseOutputContent(type="output_text", text=""))
1604
+
1605
+ # Aggregate images for storage
1606
+ image_markdown = ""
1607
+ for img_call in img_calls:
1608
+ fname = f"{img_call.id}.{img_call.output_format}"
1609
+ img_url = f"![{fname}]({base_url}images/{fname}?token={get_image_token(fname)})"
1610
+ image_markdown += f"\n\n{img_url}"
1611
+
1612
+ if image_markdown:
1613
+ storage_output += image_markdown
1614
+
1615
+ p_tok, c_tok, t_tok = _calculate_usage(messages, assistant_text, tool_calls)
1616
+ usage = ResponseUsage(input_tokens=p_tok, output_tokens=c_tok, total_tokens=t_tok)
1617
+ payload = _create_responses_standard_payload(
1618
+ response_id,
1619
+ created_time,
1620
+ request.model,
1621
+ tool_calls,
1622
+ img_calls,
1623
+ contents,
1624
+ usage,
1625
+ request,
1626
+ norm_input,
1627
+ )
1628
+ _persist_conversation(
1629
+ db, model.model_name, client.id, session.metadata, messages, storage_output, tool_calls
1630
+ )
1631
+ return payload
app/services/client.py CHANGED
@@ -78,24 +78,20 @@ class GeminiClientWrapper(GeminiClient):
78
  message: Message, tempdir: Path | None = None, tagged: bool = True
79
  ) -> tuple[str, list[Path | str]]:
80
  """
81
- Process a single message and return model input.
 
82
  """
83
  files: list[Path | str] = []
84
  text_fragments: list[str] = []
85
 
86
  if isinstance(message.content, str):
87
- # Pure text content
88
- if message.content:
89
- text_fragments.append(message.content)
90
  elif isinstance(message.content, list):
91
- # Mixed content
92
- # TODO: Use Pydantic to enforce the value checking
93
  for item in message.content:
94
  if item.type == "text":
95
- # Append multiple text fragments
96
- if item.text:
97
- text_fragments.append(item.text)
98
-
99
  elif item.type == "image_url":
100
  if not item.image_url:
101
  raise ValueError("Image URL cannot be empty")
@@ -103,7 +99,6 @@ class GeminiClientWrapper(GeminiClient):
103
  files.append(await save_url_to_tempfile(url, tempdir))
104
  else:
105
  raise ValueError("Image URL must contain 'url' key")
106
-
107
  elif item.type == "file":
108
  if not item.file:
109
  raise ValueError("File cannot be empty")
@@ -114,18 +109,28 @@ class GeminiClientWrapper(GeminiClient):
114
  files.append(await save_url_to_tempfile(url, tempdir))
115
  else:
116
  raise ValueError("File must contain 'file_data' or 'url' key")
 
 
117
  elif message.content is not None:
118
  raise ValueError("Unsupported message content type.")
119
 
 
 
 
 
 
 
 
120
  if message.tool_calls:
121
  tool_blocks: list[str] = []
122
  for call in message.tool_calls:
123
  args_text = call.function.arguments.strip()
124
  try:
125
  parsed_args = orjson.loads(args_text)
126
- args_text = orjson.dumps(parsed_args).decode("utf-8")
 
 
127
  except orjson.JSONDecodeError:
128
- # Leave args_text as is if it is not valid JSON
129
  pass
130
  tool_blocks.append(
131
  f'<tool_call name="{call.function.name}">{args_text}</tool_call>'
@@ -135,10 +140,9 @@ class GeminiClientWrapper(GeminiClient):
135
  tool_section = "```xml\n" + "".join(tool_blocks) + "\n```"
136
  text_fragments.append(tool_section)
137
 
138
- model_input = "\n".join(fragment for fragment in text_fragments if fragment)
139
 
140
- # Add role tag if needed
141
- if model_input:
142
  if tagged:
143
  model_input = add_tag(message.role, model_input)
144
 
@@ -148,48 +152,29 @@ class GeminiClientWrapper(GeminiClient):
148
  async def process_conversation(
149
  messages: list[Message], tempdir: Path | None = None
150
  ) -> tuple[str, list[Path | str]]:
151
- """
152
- Process the entire conversation and return a formatted string and list of
153
- files. The last message is assumed to be the assistant's response.
154
- """
155
- # Determine once whether we need to wrap messages with role tags: only required
156
- # if the history already contains assistant/system messages. When every message
157
- # so far is from the user, we can skip tagging entirely.
158
  need_tag = any(m.role != "user" for m in messages)
159
-
160
  conversation: list[str] = []
161
  files: list[Path | str] = []
162
-
163
  for msg in messages:
164
  input_part, files_part = await GeminiClientWrapper.process_message(
165
  msg, tempdir, tagged=need_tag
166
  )
167
  conversation.append(input_part)
168
  files.extend(files_part)
169
-
170
- # Append an opening assistant tag only when we used tags above so that Gemini
171
- # knows where to start its reply.
172
  if need_tag:
173
  conversation.append(add_tag("assistant", "", unclose=True))
174
-
175
  return "\n".join(conversation), files
176
 
177
  @staticmethod
178
  def extract_output(response: ModelOutput, include_thoughts: bool = True) -> str:
179
- """
180
- Extract and format the output text from the Gemini response.
181
- """
182
  text = ""
183
-
184
  if include_thoughts and response.thoughts:
185
  text += f"<think>{response.thoughts}</think>\n"
186
-
187
  if response.text:
188
  text += response.text
189
  else:
190
  text += str(response)
191
 
192
- # Fix some escaped characters
193
  def _unescape_html(text_content: str) -> str:
194
  parts: list[str] = []
195
  last_index = 0
 
78
  message: Message, tempdir: Path | None = None, tagged: bool = True
79
  ) -> tuple[str, list[Path | str]]:
80
  """
81
+ Process a single Message object into a format suitable for the Gemini API.
82
+ Extracts text fragments, handles images and files, and appends tool call blocks if present.
83
  """
84
  files: list[Path | str] = []
85
  text_fragments: list[str] = []
86
 
87
  if isinstance(message.content, str):
88
+ if message.content or message.role == "tool":
89
+ text_fragments.append(message.content or "{}")
 
90
  elif isinstance(message.content, list):
 
 
91
  for item in message.content:
92
  if item.type == "text":
93
+ if item.text or message.role == "tool":
94
+ text_fragments.append(item.text or "{}")
 
 
95
  elif item.type == "image_url":
96
  if not item.image_url:
97
  raise ValueError("Image URL cannot be empty")
 
99
  files.append(await save_url_to_tempfile(url, tempdir))
100
  else:
101
  raise ValueError("Image URL must contain 'url' key")
 
102
  elif item.type == "file":
103
  if not item.file:
104
  raise ValueError("File cannot be empty")
 
109
  files.append(await save_url_to_tempfile(url, tempdir))
110
  else:
111
  raise ValueError("File must contain 'file_data' or 'url' key")
112
+ elif message.content is None and message.role == "tool":
113
+ text_fragments.append("{}")
114
  elif message.content is not None:
115
  raise ValueError("Unsupported message content type.")
116
 
117
+ if message.role == "tool":
118
+ tool_name = message.name or "unknown"
119
+ combined_content = "\n".join(text_fragments).strip() or "{}"
120
+ text_fragments = [
121
+ f'<tool_response name="{tool_name}">{combined_content}</tool_response>'
122
+ ]
123
+
124
  if message.tool_calls:
125
  tool_blocks: list[str] = []
126
  for call in message.tool_calls:
127
  args_text = call.function.arguments.strip()
128
  try:
129
  parsed_args = orjson.loads(args_text)
130
+ args_text = orjson.dumps(parsed_args, option=orjson.OPT_SORT_KEYS).decode(
131
+ "utf-8"
132
+ )
133
  except orjson.JSONDecodeError:
 
134
  pass
135
  tool_blocks.append(
136
  f'<tool_call name="{call.function.name}">{args_text}</tool_call>'
 
140
  tool_section = "```xml\n" + "".join(tool_blocks) + "\n```"
141
  text_fragments.append(tool_section)
142
 
143
+ model_input = "\n".join(fragment for fragment in text_fragments if fragment is not None)
144
 
145
+ if model_input or message.role == "tool":
 
146
  if tagged:
147
  model_input = add_tag(message.role, model_input)
148
 
 
152
  async def process_conversation(
153
  messages: list[Message], tempdir: Path | None = None
154
  ) -> tuple[str, list[Path | str]]:
 
 
 
 
 
 
 
155
  need_tag = any(m.role != "user" for m in messages)
 
156
  conversation: list[str] = []
157
  files: list[Path | str] = []
 
158
  for msg in messages:
159
  input_part, files_part = await GeminiClientWrapper.process_message(
160
  msg, tempdir, tagged=need_tag
161
  )
162
  conversation.append(input_part)
163
  files.extend(files_part)
 
 
 
164
  if need_tag:
165
  conversation.append(add_tag("assistant", "", unclose=True))
 
166
  return "\n".join(conversation), files
167
 
168
  @staticmethod
169
  def extract_output(response: ModelOutput, include_thoughts: bool = True) -> str:
 
 
 
170
  text = ""
 
171
  if include_thoughts and response.thoughts:
172
  text += f"<think>{response.thoughts}</think>\n"
 
173
  if response.text:
174
  text += response.text
175
  else:
176
  text += str(response)
177
 
 
178
  def _unescape_html(text_content: str) -> str:
179
  parts: list[str] = []
180
  last_index = 0
app/services/lmdb.py CHANGED
@@ -11,45 +11,82 @@ from loguru import logger
11
 
12
  from ..models import ContentItem, ConversationInStore, Message
13
  from ..utils import g_config
14
- from ..utils.helper import extract_tool_calls, remove_tool_call_blocks
 
 
 
 
15
  from ..utils.singleton import Singleton
16
 
17
 
18
  def _hash_message(message: Message) -> str:
19
- """Generate a consistent hash for a single message focusing ONLY on logic/content, ignoring technical IDs."""
 
 
 
 
20
  core_data = {
21
  "role": message.role,
22
  "name": message.name,
 
23
  }
24
 
25
- # Normalize content: strip, handle empty/None, and list-of-text items
26
  content = message.content
27
  if not content:
28
  core_data["content"] = None
29
  elif isinstance(content, str):
30
- # Normalize line endings and strip whitespace
31
- normalized = content.replace("\r\n", "\n").strip()
 
 
 
 
 
 
 
 
 
 
32
  core_data["content"] = normalized if normalized else None
33
  elif isinstance(content, list):
34
  text_parts = []
35
  for item in content:
 
36
  if isinstance(item, ContentItem) and item.type == "text":
37
- text_parts.append(item.text or "")
38
  elif isinstance(item, dict) and item.get("type") == "text":
39
- text_parts.append(item.get("text") or "")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40
  else:
41
- # If it contains non-text (images/files), keep the full list for hashing
42
- text_parts = None
43
- break
44
-
45
- if text_parts is not None:
46
- # Normalize each part but keep them as a list to preserve boundaries and avoid collisions
47
- normalized_parts = [p.replace("\r\n", "\n") for p in text_parts]
48
- core_data["content"] = normalized_parts if normalized_parts else None
49
- else:
50
- core_data["content"] = message.model_dump(mode="json")["content"]
 
51
 
52
- # Normalize tool_calls: Focus ONLY on function name and arguments
53
  if message.tool_calls:
54
  calls_data = []
55
  for tc in message.tool_calls:
@@ -66,14 +103,14 @@ def _hash_message(message: Message) -> str:
66
  "arguments": canon_args,
67
  }
68
  )
69
- # Sort calls to be order-independent
70
  calls_data.sort(key=lambda x: (x["name"], x["arguments"]))
71
  core_data["tool_calls"] = calls_data
72
  else:
73
  core_data["tool_calls"] = None
74
 
75
  message_bytes = orjson.dumps(core_data, option=orjson.OPT_SORT_KEYS)
76
- return hashlib.sha256(message_bytes).hexdigest()
 
77
 
78
 
79
  def _hash_conversation(client_id: str, model: str, messages: List[Message]) -> str:
@@ -123,16 +160,14 @@ class LMDBConversationStore(metaclass=Singleton):
123
  self._init_environment()
124
 
125
  def _ensure_db_path(self) -> None:
126
- """Ensure database directory exists."""
127
  self.db_path.parent.mkdir(parents=True, exist_ok=True)
128
 
129
  def _init_environment(self) -> None:
130
- """Initialize LMDB environment."""
131
  try:
132
  self._env = lmdb.open(
133
  str(self.db_path),
134
  map_size=self.max_db_size,
135
- max_dbs=3, # main, metadata, and index databases
136
  writemap=True,
137
  readahead=False,
138
  meminit=False,
@@ -144,7 +179,6 @@ class LMDBConversationStore(metaclass=Singleton):
144
 
145
  @contextmanager
146
  def _get_transaction(self, write: bool = False):
147
- """Get LMDB transaction context manager."""
148
  if not self._env:
149
  raise RuntimeError("LMDB environment not initialized")
150
 
@@ -178,11 +212,15 @@ class LMDBConversationStore(metaclass=Singleton):
178
  if not conv:
179
  raise ValueError("Messages list cannot be empty")
180
 
 
 
 
 
 
181
  # Generate hash for the message list
182
  message_hash = _hash_conversation(conv.client_id, conv.model, conv.messages)
183
  storage_key = custom_key or message_hash
184
 
185
- # Prepare data for storage
186
  now = datetime.now()
187
  if conv.created_at is None:
188
  conv.created_at = now
@@ -192,20 +230,18 @@ class LMDBConversationStore(metaclass=Singleton):
192
 
193
  try:
194
  with self._get_transaction(write=True) as txn:
195
- # Store main data
196
  txn.put(storage_key.encode("utf-8"), value, overwrite=True)
197
 
198
- # Store hash -> key mapping for reverse lookup
199
  txn.put(
200
  f"{self.HASH_LOOKUP_PREFIX}{message_hash}".encode("utf-8"),
201
  storage_key.encode("utf-8"),
202
  )
203
 
204
- logger.debug(f"Stored {len(conv.messages)} messages with key: {storage_key}")
205
  return storage_key
206
 
207
  except Exception as e:
208
- logger.error(f"Failed to store conversation: {e}")
209
  raise
210
 
211
  def get(self, key: str) -> Optional[ConversationInStore]:
@@ -227,39 +263,35 @@ class LMDBConversationStore(metaclass=Singleton):
227
  storage_data = orjson.loads(data) # type: ignore
228
  conv = ConversationInStore.model_validate(storage_data)
229
 
230
- logger.debug(f"Retrieved {len(conv.messages)} messages for key: {key}")
231
  return conv
232
 
233
  except Exception as e:
234
- logger.error(f"Failed to retrieve messages for key {key}: {e}")
235
  return None
236
 
237
  def find(self, model: str, messages: List[Message]) -> Optional[ConversationInStore]:
238
  """
239
  Search conversation data by message list.
240
-
241
- Args:
242
- model: Model name of the conversations
243
- messages: List of messages to search for
244
-
245
- Returns:
246
- Conversation or None if not found
247
  """
248
  if not messages:
249
  return None
250
 
251
  # --- Find with raw messages ---
252
  if conv := self._find_by_message_list(model, messages):
253
- logger.debug("Found conversation with raw message history.")
254
  return conv
255
 
256
  # --- Find with cleaned messages ---
257
  cleaned_messages = self.sanitize_assistant_messages(messages)
258
- if conv := self._find_by_message_list(model, cleaned_messages):
259
- logger.debug("Found conversation with cleaned message history.")
260
- return conv
 
 
 
261
 
262
- logger.debug("No conversation found for either raw or cleaned history.")
263
  return None
264
 
265
  def _find_by_message_list(
@@ -330,11 +362,11 @@ class LMDBConversationStore(metaclass=Singleton):
330
  if message_hash and key != message_hash:
331
  txn.delete(f"{self.HASH_LOOKUP_PREFIX}{message_hash}".encode("utf-8"))
332
 
333
- logger.debug(f"Deleted messages with key: {key}")
334
  return conv
335
 
336
  except Exception as e:
337
- logger.error(f"Failed to delete key {key}: {e}")
338
  return None
339
 
340
  def keys(self, prefix: str = "", limit: Optional[int] = None) -> List[str]:
@@ -478,6 +510,8 @@ class LMDBConversationStore(metaclass=Singleton):
478
  """
479
  Remove all <think>...</think> tags and strip whitespace.
480
  """
 
 
481
  # Remove all think blocks anywhere in the text
482
  cleaned_content = re.sub(r"<think>.*?</think>", "", text, flags=re.DOTALL)
483
  return cleaned_content.strip()
@@ -485,12 +519,8 @@ class LMDBConversationStore(metaclass=Singleton):
485
  @staticmethod
486
  def sanitize_assistant_messages(messages: list[Message]) -> list[Message]:
487
  """
488
- Create a new list of messages with assistant content cleaned of <think> tags
489
- and system hints/tool call blocks. This is used for both storing and
490
- searching chat history to ensure consistency.
491
-
492
- If a message has no tool_calls but contains tool call XML blocks in its
493
- content, they will be extracted and moved to the tool_calls field.
494
  """
495
  cleaned_messages = []
496
  for msg in messages:
@@ -503,12 +533,12 @@ class LMDBConversationStore(metaclass=Singleton):
503
  else:
504
  text = remove_tool_call_blocks(text).strip()
505
 
506
- normalized_content = text.strip()
507
 
508
  if normalized_content != msg.content or tool_calls != msg.tool_calls:
509
  cleaned_msg = msg.model_copy(
510
  update={
511
- "content": normalized_content or None,
512
  "tool_calls": tool_calls or None,
513
  }
514
  )
 
11
 
12
  from ..models import ContentItem, ConversationInStore, Message
13
  from ..utils import g_config
14
+ from ..utils.helper import (
15
+ extract_tool_calls,
16
+ remove_tool_call_blocks,
17
+ strip_system_hints,
18
+ )
19
  from ..utils.singleton import Singleton
20
 
21
 
22
  def _hash_message(message: Message) -> str:
23
+ """
24
+ Generate a stable, canonical hash for a single message.
25
+ Strips system hints, thoughts, and tool call blocks to ensure
26
+ identical logical content produces the same hash regardless of format.
27
+ """
28
  core_data = {
29
  "role": message.role,
30
  "name": message.name,
31
+ "tool_call_id": message.tool_call_id,
32
  }
33
 
 
34
  content = message.content
35
  if not content:
36
  core_data["content"] = None
37
  elif isinstance(content, str):
38
+ normalized = content.replace("\r\n", "\n")
39
+
40
+ normalized = LMDBConversationStore.remove_think_tags(normalized)
41
+ normalized = strip_system_hints(normalized)
42
+
43
+ if message.tool_calls:
44
+ normalized = remove_tool_call_blocks(normalized)
45
+ else:
46
+ temp_text, _extracted = extract_tool_calls(normalized)
47
+ normalized = temp_text
48
+
49
+ normalized = normalized.strip()
50
  core_data["content"] = normalized if normalized else None
51
  elif isinstance(content, list):
52
  text_parts = []
53
  for item in content:
54
+ text_val = ""
55
  if isinstance(item, ContentItem) and item.type == "text":
56
+ text_val = item.text or ""
57
  elif isinstance(item, dict) and item.get("type") == "text":
58
+ text_val = item.get("text") or ""
59
+
60
+ if text_val:
61
+ text_val = text_val.replace("\r\n", "\n")
62
+ text_val = LMDBConversationStore.remove_think_tags(text_val)
63
+ text_val = strip_system_hints(text_val)
64
+ text_val = remove_tool_call_blocks(text_val).strip()
65
+ if text_val:
66
+ text_parts.append(text_val)
67
+ elif isinstance(item, ContentItem) and item.type in ("image_url", "file"):
68
+ # For non-text items, include their unique markers to distinguish them
69
+ if item.type == "image_url":
70
+ text_parts.append(
71
+ f"[image_url:{item.image_url.get('url') if item.image_url else ''}]"
72
+ )
73
+ elif item.type == "file":
74
+ text_parts.append(
75
+ f"[file:{item.file.get('url') or item.file.get('filename') if item.file else ''}]"
76
+ )
77
  else:
78
+ # Fallback for other dict-based content parts
79
+ part_type = item.get("type") if isinstance(item, dict) else None
80
+ if part_type == "image_url":
81
+ url = item.get("image_url", {}).get("url")
82
+ text_parts.append(f"[image_url:{url}]")
83
+ elif part_type == "file":
84
+ url = item.get("file", {}).get("url") or item.get("file", {}).get("filename")
85
+ text_parts.append(f"[file:{url}]")
86
+
87
+ combined_text = "\n".join(text_parts).replace("\r\n", "\n").strip()
88
+ core_data["content"] = combined_text if combined_text else None
89
 
 
90
  if message.tool_calls:
91
  calls_data = []
92
  for tc in message.tool_calls:
 
103
  "arguments": canon_args,
104
  }
105
  )
 
106
  calls_data.sort(key=lambda x: (x["name"], x["arguments"]))
107
  core_data["tool_calls"] = calls_data
108
  else:
109
  core_data["tool_calls"] = None
110
 
111
  message_bytes = orjson.dumps(core_data, option=orjson.OPT_SORT_KEYS)
112
+ digest = hashlib.sha256(message_bytes).hexdigest()
113
+ return digest
114
 
115
 
116
  def _hash_conversation(client_id: str, model: str, messages: List[Message]) -> str:
 
160
  self._init_environment()
161
 
162
  def _ensure_db_path(self) -> None:
 
163
  self.db_path.parent.mkdir(parents=True, exist_ok=True)
164
 
165
  def _init_environment(self) -> None:
 
166
  try:
167
  self._env = lmdb.open(
168
  str(self.db_path),
169
  map_size=self.max_db_size,
170
+ max_dbs=3,
171
  writemap=True,
172
  readahead=False,
173
  meminit=False,
 
179
 
180
  @contextmanager
181
  def _get_transaction(self, write: bool = False):
 
182
  if not self._env:
183
  raise RuntimeError("LMDB environment not initialized")
184
 
 
212
  if not conv:
213
  raise ValueError("Messages list cannot be empty")
214
 
215
+ # Sanitize messages before computing hash and storing to ensure consistency
216
+ # with the search (find) logic, which also sanitizes its prefix.
217
+ sanitized_messages = self.sanitize_assistant_messages(conv.messages)
218
+ conv.messages = sanitized_messages
219
+
220
  # Generate hash for the message list
221
  message_hash = _hash_conversation(conv.client_id, conv.model, conv.messages)
222
  storage_key = custom_key or message_hash
223
 
 
224
  now = datetime.now()
225
  if conv.created_at is None:
226
  conv.created_at = now
 
230
 
231
  try:
232
  with self._get_transaction(write=True) as txn:
 
233
  txn.put(storage_key.encode("utf-8"), value, overwrite=True)
234
 
 
235
  txn.put(
236
  f"{self.HASH_LOOKUP_PREFIX}{message_hash}".encode("utf-8"),
237
  storage_key.encode("utf-8"),
238
  )
239
 
240
+ logger.debug(f"Stored {len(conv.messages)} messages with key: {storage_key[:12]}")
241
  return storage_key
242
 
243
  except Exception as e:
244
+ logger.error(f"Failed to store messages with key {storage_key[:12]}: {e}")
245
  raise
246
 
247
  def get(self, key: str) -> Optional[ConversationInStore]:
 
263
  storage_data = orjson.loads(data) # type: ignore
264
  conv = ConversationInStore.model_validate(storage_data)
265
 
266
+ logger.debug(f"Retrieved {len(conv.messages)} messages with key: {key[:12]}")
267
  return conv
268
 
269
  except Exception as e:
270
+ logger.error(f"Failed to retrieve messages with key {key[:12]}: {e}")
271
  return None
272
 
273
  def find(self, model: str, messages: List[Message]) -> Optional[ConversationInStore]:
274
  """
275
  Search conversation data by message list.
 
 
 
 
 
 
 
276
  """
277
  if not messages:
278
  return None
279
 
280
  # --- Find with raw messages ---
281
  if conv := self._find_by_message_list(model, messages):
282
+ logger.debug(f"Session found for '{model}' with {len(messages)} raw messages.")
283
  return conv
284
 
285
  # --- Find with cleaned messages ---
286
  cleaned_messages = self.sanitize_assistant_messages(messages)
287
+ if cleaned_messages != messages:
288
+ if conv := self._find_by_message_list(model, cleaned_messages):
289
+ logger.debug(
290
+ f"Session found for '{model}' with {len(cleaned_messages)} cleaned messages."
291
+ )
292
+ return conv
293
 
294
+ logger.debug(f"No session found for '{model}' with {len(messages)} messages.")
295
  return None
296
 
297
  def _find_by_message_list(
 
362
  if message_hash and key != message_hash:
363
  txn.delete(f"{self.HASH_LOOKUP_PREFIX}{message_hash}".encode("utf-8"))
364
 
365
+ logger.debug(f"Deleted messages with key: {key[:12]}")
366
  return conv
367
 
368
  except Exception as e:
369
+ logger.error(f"Failed to delete messages with key {key[:12]}: {e}")
370
  return None
371
 
372
  def keys(self, prefix: str = "", limit: Optional[int] = None) -> List[str]:
 
510
  """
511
  Remove all <think>...</think> tags and strip whitespace.
512
  """
513
+ if not text:
514
+ return text
515
  # Remove all think blocks anywhere in the text
516
  cleaned_content = re.sub(r"<think>.*?</think>", "", text, flags=re.DOTALL)
517
  return cleaned_content.strip()
 
519
  @staticmethod
520
  def sanitize_assistant_messages(messages: list[Message]) -> list[Message]:
521
  """
522
+ Produce a canonical history where assistant messages are cleaned of
523
+ internal markers and tool call blocks are moved to metadata.
 
 
 
 
524
  """
525
  cleaned_messages = []
526
  for msg in messages:
 
533
  else:
534
  text = remove_tool_call_blocks(text).strip()
535
 
536
+ normalized_content = text.strip() or None
537
 
538
  if normalized_content != msg.content or tool_calls != msg.tool_calls:
539
  cleaned_msg = msg.model_copy(
540
  update={
541
+ "content": normalized_content,
542
  "tool_calls": tool_calls or None,
543
  }
544
  )
app/services/pool.py CHANGED
@@ -31,7 +31,7 @@ class GeminiClientPool(metaclass=Singleton):
31
  self._clients.append(client)
32
  self._id_map[c.id] = client
33
  self._round_robin.append(client)
34
- self._restart_locks[c.id] = asyncio.Lock() # Pre-initialize
35
 
36
  async def init(self) -> None:
37
  """Initialize all clients in the pool."""
@@ -84,7 +84,7 @@ class GeminiClientPool(metaclass=Singleton):
84
 
85
  lock = self._restart_locks.get(client.id)
86
  if lock is None:
87
- return False # Should not happen
88
 
89
  async with lock:
90
  if client.running():
 
31
  self._clients.append(client)
32
  self._id_map[c.id] = client
33
  self._round_robin.append(client)
34
+ self._restart_locks[c.id] = asyncio.Lock()
35
 
36
  async def init(self) -> None:
37
  """Initialize all clients in the pool."""
 
84
 
85
  lock = self._restart_locks.get(client.id)
86
  if lock is None:
87
+ return False
88
 
89
  async with lock:
90
  if client.running():
app/utils/helper.py CHANGED
@@ -5,7 +5,6 @@ import re
5
  import struct
6
  import tempfile
7
  from pathlib import Path
8
- from typing import Iterator
9
  from urllib.parse import urlparse
10
 
11
  import httpx
@@ -68,7 +67,6 @@ async def save_url_to_tempfile(url: str, tempdir: Path | None = None) -> Path:
68
  data: bytes | None = None
69
  suffix: str | None = None
70
  if url.startswith("data:image/"):
71
- # Base64 encoded image
72
  metadata_part = url.split(",")[0]
73
  mime_type = metadata_part.split(":")[1].split(";")[0]
74
 
@@ -112,9 +110,9 @@ def strip_code_fence(text: str) -> str:
112
 
113
 
114
  def strip_tagged_blocks(text: str) -> str:
115
- """Remove <|im_start|>role ... <|im_end|> sections, dropping tool blocks entirely.
116
- - tool blocks are removed entirely (if missing end marker, drop to EOF).
117
- - other roles: remove markers and role, keep inner content (if missing end marker, keep to EOF).
118
  """
119
  if not text:
120
  return text
@@ -131,13 +129,11 @@ def strip_tagged_blocks(text: str) -> str:
131
  result.append(text[idx:])
132
  break
133
 
134
- # append any content before this block
135
  result.append(text[idx:start])
136
 
137
  role_start = start + len(start_marker)
138
  newline = text.find("\n", role_start)
139
  if newline == -1:
140
- # malformed block; keep the remainder as-is (safe behavior)
141
  result.append(text[start:])
142
  break
143
 
@@ -145,23 +141,18 @@ def strip_tagged_blocks(text: str) -> str:
145
 
146
  end = text.find(end_marker, newline + 1)
147
  if end == -1:
148
- # missing end marker
149
  if role == "tool":
150
- # drop from the start marker to EOF (skip the remainder)
151
  break
152
  else:
153
- # keep inner content from after the role newline to EOF
154
  result.append(text[newline + 1 :])
155
  break
156
 
157
  block_end = end + len(end_marker)
158
 
159
  if role == "tool":
160
- # drop the whole block
161
  idx = block_end
162
  continue
163
 
164
- # keep the content without role markers
165
  content = text[newline + 1 : end]
166
  result.append(content)
167
  idx = block_end
@@ -180,41 +171,19 @@ def strip_system_hints(text: str) -> str:
180
  return cleaned.strip()
181
 
182
 
183
- def remove_tool_call_blocks(text: str) -> str:
184
- """Strip tool call code blocks from text."""
185
- if not text:
186
- return text
187
-
188
- # 1. Remove fenced blocks ONLY if they contain tool calls
189
- def _replace_block(match: re.Match[str]) -> str:
190
- block_content = match.group(1)
191
- if not block_content:
192
- return match.group(0)
193
-
194
- # Check if the block contains any tool call tag
195
- if TOOL_CALL_RE.search(block_content):
196
- return ""
197
-
198
- # Preserve the block if no tool call found
199
- return match.group(0)
200
-
201
- cleaned = TOOL_BLOCK_RE.sub(_replace_block, text)
202
-
203
- # 2. Remove orphaned tool calls
204
- cleaned = TOOL_CALL_RE.sub("", cleaned)
205
-
206
- return strip_system_hints(cleaned)
207
-
208
-
209
- def extract_tool_calls(text: str) -> tuple[str, list[ToolCall]]:
210
- """Extract tool call definitions and return cleaned text."""
211
  if not text:
212
  return text, []
213
 
214
  tool_calls: list[ToolCall] = []
215
 
216
  def _create_tool_call(name: str, raw_args: str) -> None:
217
- """Helper to parse args and append to the tool_calls list."""
 
218
  if not name:
219
  logger.warning("Encountered tool_call without a function name.")
220
  return
@@ -226,8 +195,6 @@ def extract_tool_calls(text: str) -> tuple[str, list[ToolCall]]:
226
  except orjson.JSONDecodeError:
227
  logger.warning(f"Failed to parse tool call arguments for '{name}'. Passing raw string.")
228
 
229
- # Generate a deterministic ID based on name, arguments, and its global sequence index
230
- # to ensure uniqueness across multiple fenced blocks while remaining stable for storage.
231
  index = len(tool_calls)
232
  seed = f"{name}:{arguments}:{index}".encode("utf-8")
233
  call_id = f"call_{hashlib.sha256(seed).hexdigest()[:24]}"
@@ -245,14 +212,14 @@ def extract_tool_calls(text: str) -> tuple[str, list[ToolCall]]:
245
  if not block_content:
246
  return match.group(0)
247
 
248
- found_in_block = False
249
- for call_match in TOOL_CALL_RE.finditer(block_content):
250
- found_in_block = True
251
- name = (call_match.group(1) or "").strip()
252
- raw_args = (call_match.group(2) or "").strip()
253
- _create_tool_call(name, raw_args)
254
 
255
- if found_in_block:
 
 
 
 
 
256
  return ""
257
  else:
258
  return match.group(0)
@@ -260,56 +227,26 @@ def extract_tool_calls(text: str) -> tuple[str, list[ToolCall]]:
260
  cleaned = TOOL_BLOCK_RE.sub(_replace_block, text)
261
 
262
  def _replace_orphan(match: re.Match[str]) -> str:
263
- name = (match.group(1) or "").strip()
264
- raw_args = (match.group(2) or "").strip()
265
- _create_tool_call(name, raw_args)
 
266
  return ""
267
 
268
  cleaned = TOOL_CALL_RE.sub(_replace_orphan, cleaned)
269
-
270
  cleaned = strip_system_hints(cleaned)
271
  return cleaned, tool_calls
272
 
273
 
274
- def iter_stream_segments(model_output: str, chunk_size: int = 64) -> Iterator[str]:
275
- """Yield stream segments while keeping <think> markers and words intact."""
276
- if not model_output:
277
- return
278
-
279
- token_pattern = re.compile(r"\s+|\S+\s*")
280
- pending = ""
281
-
282
- def _flush_pending() -> Iterator[str]:
283
- nonlocal pending
284
- if pending:
285
- yield pending
286
- pending = ""
287
-
288
- # Split on <think> boundaries so the markers are never fragmented.
289
- parts = re.split(r"(</?think>)", model_output)
290
- for part in parts:
291
- if not part:
292
- continue
293
- if part in {"<think>", "</think>"}:
294
- yield from _flush_pending()
295
- yield part
296
- continue
297
-
298
- for match in token_pattern.finditer(part):
299
- token = match.group(0)
300
-
301
- if len(token) > chunk_size:
302
- yield from _flush_pending()
303
- for idx in range(0, len(token), chunk_size):
304
- yield token[idx : idx + chunk_size]
305
- continue
306
-
307
- if pending and len(pending) + len(token) > chunk_size:
308
- yield from _flush_pending()
309
 
310
- pending += token
311
 
312
- yield from _flush_pending()
 
 
313
 
314
 
315
  def text_from_message(message: Message) -> str:
 
5
  import struct
6
  import tempfile
7
  from pathlib import Path
 
8
  from urllib.parse import urlparse
9
 
10
  import httpx
 
67
  data: bytes | None = None
68
  suffix: str | None = None
69
  if url.startswith("data:image/"):
 
70
  metadata_part = url.split(",")[0]
71
  mime_type = metadata_part.split(":")[1].split(";")[0]
72
 
 
110
 
111
 
112
  def strip_tagged_blocks(text: str) -> str:
113
+ """Remove <|im_start|>role ... <|im_end|> sections.
114
+ - tool blocks are removed entirely (including content).
115
+ - other roles: remove markers and role, keep inner content.
116
  """
117
  if not text:
118
  return text
 
129
  result.append(text[idx:])
130
  break
131
 
 
132
  result.append(text[idx:start])
133
 
134
  role_start = start + len(start_marker)
135
  newline = text.find("\n", role_start)
136
  if newline == -1:
 
137
  result.append(text[start:])
138
  break
139
 
 
141
 
142
  end = text.find(end_marker, newline + 1)
143
  if end == -1:
 
144
  if role == "tool":
 
145
  break
146
  else:
 
147
  result.append(text[newline + 1 :])
148
  break
149
 
150
  block_end = end + len(end_marker)
151
 
152
  if role == "tool":
 
153
  idx = block_end
154
  continue
155
 
 
156
  content = text[newline + 1 : end]
157
  result.append(content)
158
  idx = block_end
 
171
  return cleaned.strip()
172
 
173
 
174
+ def _process_tools_internal(text: str, extract: bool = True) -> tuple[str, list[ToolCall]]:
175
+ """
176
+ Unified engine for stripping tool call blocks and extracting tool metadata.
177
+ If extract=True, parses JSON arguments and assigns deterministic call IDs.
178
+ """
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
179
  if not text:
180
  return text, []
181
 
182
  tool_calls: list[ToolCall] = []
183
 
184
  def _create_tool_call(name: str, raw_args: str) -> None:
185
+ if not extract:
186
+ return
187
  if not name:
188
  logger.warning("Encountered tool_call without a function name.")
189
  return
 
195
  except orjson.JSONDecodeError:
196
  logger.warning(f"Failed to parse tool call arguments for '{name}'. Passing raw string.")
197
 
 
 
198
  index = len(tool_calls)
199
  seed = f"{name}:{arguments}:{index}".encode("utf-8")
200
  call_id = f"call_{hashlib.sha256(seed).hexdigest()[:24]}"
 
212
  if not block_content:
213
  return match.group(0)
214
 
215
+ is_tool_block = bool(TOOL_CALL_RE.search(block_content))
 
 
 
 
 
216
 
217
+ if is_tool_block:
218
+ if extract:
219
+ for call_match in TOOL_CALL_RE.finditer(block_content):
220
+ name = (call_match.group(1) or "").strip()
221
+ raw_args = (call_match.group(2) or "").strip()
222
+ _create_tool_call(name, raw_args)
223
  return ""
224
  else:
225
  return match.group(0)
 
227
  cleaned = TOOL_BLOCK_RE.sub(_replace_block, text)
228
 
229
  def _replace_orphan(match: re.Match[str]) -> str:
230
+ if extract:
231
+ name = (match.group(1) or "").strip()
232
+ raw_args = (match.group(2) or "").strip()
233
+ _create_tool_call(name, raw_args)
234
  return ""
235
 
236
  cleaned = TOOL_CALL_RE.sub(_replace_orphan, cleaned)
 
237
  cleaned = strip_system_hints(cleaned)
238
  return cleaned, tool_calls
239
 
240
 
241
+ def remove_tool_call_blocks(text: str) -> str:
242
+ """Strip tool call code blocks from text."""
243
+ cleaned, _ = _process_tools_internal(text, extract=False)
244
+ return cleaned
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
245
 
 
246
 
247
+ def extract_tool_calls(text: str) -> tuple[str, list[ToolCall]]:
248
+ """Extract tool call definitions and return cleaned text."""
249
+ return _process_tools_internal(text, extract=True)
250
 
251
 
252
  def text_from_message(message: Message) -> str:
pyproject.toml CHANGED
@@ -6,10 +6,10 @@ readme = "README.md"
6
  requires-python = "==3.12.*"
7
  dependencies = [
8
  "fastapi>=0.128.0",
9
- "gemini-webapi>=1.17.3",
10
  "lmdb>=1.7.5",
11
  "loguru>=0.7.3",
12
- "orjson>=3.11.5",
13
  "pydantic-settings[yaml]>=2.12.0",
14
  "uvicorn>=0.40.0",
15
  "uvloop>=0.22.1; sys_platform != 'win32'",
 
6
  requires-python = "==3.12.*"
7
  dependencies = [
8
  "fastapi>=0.128.0",
9
+ "gemini-webapi>=1.18.0",
10
  "lmdb>=1.7.5",
11
  "loguru>=0.7.3",
12
+ "orjson>=3.11.7",
13
  "pydantic-settings[yaml]>=2.12.0",
14
  "uvicorn>=0.40.0",
15
  "uvloop>=0.22.1; sys_platform != 'win32'",
uv.lock CHANGED
@@ -106,10 +106,10 @@ dev = [
106
  [package.metadata]
107
  requires-dist = [
108
  { name = "fastapi", specifier = ">=0.128.0" },
109
- { name = "gemini-webapi", specifier = ">=1.17.3" },
110
  { name = "lmdb", specifier = ">=1.7.5" },
111
  { name = "loguru", specifier = ">=0.7.3" },
112
- { name = "orjson", specifier = ">=3.11.5" },
113
  { name = "pydantic-settings", extras = ["yaml"], specifier = ">=2.12.0" },
114
  { name = "ruff", marker = "extra == 'dev'", specifier = ">=0.14.14" },
115
  { name = "uvicorn", specifier = ">=0.40.0" },
@@ -122,17 +122,17 @@ dev = [{ name = "ruff", specifier = ">=0.14.14" }]
122
 
123
  [[package]]
124
  name = "gemini-webapi"
125
- version = "1.17.3"
126
  source = { registry = "https://pypi.org/simple" }
127
  dependencies = [
128
- { name = "httpx" },
129
  { name = "loguru" },
130
  { name = "orjson" },
131
  { name = "pydantic" },
132
  ]
133
- sdist = { url = "https://files.pythonhosted.org/packages/aa/74/1a31f3605250eb5cbcbfb15559c43b0d71734c8d286cfa9a7833841306e3/gemini_webapi-1.17.3.tar.gz", hash = "sha256:6201f9eaf5f562c5dc589d71c0edbba9e2eb8f780febbcf35307697bf474d577", size = 259418, upload-time = "2025-12-05T22:38:44.426Z" }
134
  wheels = [
135
- { url = "https://files.pythonhosted.org/packages/4c/a3/a88ff45197dce68a81d92c8d40368e4c26f67faf3af3273357f3f71f5c3d/gemini_webapi-1.17.3-py3-none-any.whl", hash = "sha256:d83969b1fa3236f3010d856d191b35264c936ece81f1be4c1de53ec1cf0855c8", size = 56659, upload-time = "2025-12-05T22:38:42.93Z" },
136
  ]
137
 
138
  [[package]]
@@ -144,6 +144,28 @@ wheels = [
144
  { url = "https://files.pythonhosted.org/packages/04/4b/29cac41a4d98d144bf5f6d33995617b185d14b22401f75ca86f384e87ff1/h11-0.16.0-py3-none-any.whl", hash = "sha256:63cf8bbe7522de3bf65932fda1d9c2772064ffb3dae62d55932da54b31cb6c86", size = 37515, upload-time = "2025-04-24T03:35:24.344Z" },
145
  ]
146
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
147
  [[package]]
148
  name = "httpcore"
149
  version = "1.0.9"
@@ -172,6 +194,20 @@ wheels = [
172
  { url = "https://files.pythonhosted.org/packages/2a/39/e50c7c3a983047577ee07d2a9e53faf5a69493943ec3f6a384bdc792deb2/httpx-0.28.1-py3-none-any.whl", hash = "sha256:d909fcccc110f8c7faf814ca82a9a4d816bc5a6dbfea25d6591d6985b8ba59ad", size = 73517, upload-time = "2024-12-06T15:37:21.509Z" },
173
  ]
174
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
175
  [[package]]
176
  name = "idna"
177
  version = "3.11"
@@ -211,25 +247,25 @@ wheels = [
211
 
212
  [[package]]
213
  name = "orjson"
214
- version = "3.11.5"
215
  source = { registry = "https://pypi.org/simple" }
216
- sdist = { url = "https://files.pythonhosted.org/packages/04/b8/333fdb27840f3bf04022d21b654a35f58e15407183aeb16f3b41aa053446/orjson-3.11.5.tar.gz", hash = "sha256:82393ab47b4fe44ffd0a7659fa9cfaacc717eb617c93cde83795f14af5c2e9d5", size = 5972347, upload-time = "2025-12-06T15:55:39.458Z" }
217
  wheels = [
218
- { url = "https://files.pythonhosted.org/packages/ef/a4/8052a029029b096a78955eadd68ab594ce2197e24ec50e6b6d2ab3f4e33b/orjson-3.11.5-cp312-cp312-macosx_10_15_x86_64.macosx_11_0_arm64.macosx_10_15_universal2.whl", hash = "sha256:334e5b4bff9ad101237c2d799d9fd45737752929753bf4faf4b207335a416b7d", size = 245347, upload-time = "2025-12-06T15:54:22.061Z" },
219
- { url = "https://files.pythonhosted.org/packages/64/67/574a7732bd9d9d79ac620c8790b4cfe0717a3d5a6eb2b539e6e8995e24a0/orjson-3.11.5-cp312-cp312-macosx_15_0_arm64.whl", hash = "sha256:ff770589960a86eae279f5d8aa536196ebda8273a2a07db2a54e82b93bc86626", size = 129435, upload-time = "2025-12-06T15:54:23.615Z" },
220
- { url = "https://files.pythonhosted.org/packages/52/8d/544e77d7a29d90cf4d9eecd0ae801c688e7f3d1adfa2ebae5e1e94d38ab9/orjson-3.11.5-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:ed24250e55efbcb0b35bed7caaec8cedf858ab2f9f2201f17b8938c618c8ca6f", size = 132074, upload-time = "2025-12-06T15:54:24.694Z" },
221
- { url = "https://files.pythonhosted.org/packages/6e/57/b9f5b5b6fbff9c26f77e785baf56ae8460ef74acdb3eae4931c25b8f5ba9/orjson-3.11.5-cp312-cp312-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:a66d7769e98a08a12a139049aac2f0ca3adae989817f8c43337455fbc7669b85", size = 130520, upload-time = "2025-12-06T15:54:26.185Z" },
222
- { url = "https://files.pythonhosted.org/packages/f6/6d/d34970bf9eb33f9ec7c979a262cad86076814859e54eb9a059a52f6dc13d/orjson-3.11.5-cp312-cp312-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:86cfc555bfd5794d24c6a1903e558b50644e5e68e6471d66502ce5cb5fdef3f9", size = 136209, upload-time = "2025-12-06T15:54:27.264Z" },
223
- { url = "https://files.pythonhosted.org/packages/e7/39/bc373b63cc0e117a105ea12e57280f83ae52fdee426890d57412432d63b3/orjson-3.11.5-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:a230065027bc2a025e944f9d4714976a81e7ecfa940923283bca7bbc1f10f626", size = 139837, upload-time = "2025-12-06T15:54:28.75Z" },
224
- { url = "https://files.pythonhosted.org/packages/cb/aa/7c4818c8d7d324da220f4f1af55c343956003aa4d1ce1857bdc1d396ba69/orjson-3.11.5-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:b29d36b60e606df01959c4b982729c8845c69d1963f88686608be9ced96dbfaa", size = 137307, upload-time = "2025-12-06T15:54:29.856Z" },
225
- { url = "https://files.pythonhosted.org/packages/46/bf/0993b5a056759ba65145effe3a79dd5a939d4a070eaa5da2ee3180fbb13f/orjson-3.11.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:c74099c6b230d4261fdc3169d50efc09abf38ace1a42ea2f9994b1d79153d477", size = 139020, upload-time = "2025-12-06T15:54:31.024Z" },
226
- { url = "https://files.pythonhosted.org/packages/65/e8/83a6c95db3039e504eda60fc388f9faedbb4f6472f5aba7084e06552d9aa/orjson-3.11.5-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:e697d06ad57dd0c7a737771d470eedc18e68dfdefcdd3b7de7f33dfda5b6212e", size = 141099, upload-time = "2025-12-06T15:54:32.196Z" },
227
- { url = "https://files.pythonhosted.org/packages/b9/b4/24fdc024abfce31c2f6812973b0a693688037ece5dc64b7a60c1ce69e2f2/orjson-3.11.5-cp312-cp312-musllinux_1_2_armv7l.whl", hash = "sha256:e08ca8a6c851e95aaecc32bc44a5aa75d0ad26af8cdac7c77e4ed93acf3d5b69", size = 413540, upload-time = "2025-12-06T15:54:33.361Z" },
228
- { url = "https://files.pythonhosted.org/packages/d9/37/01c0ec95d55ed0c11e4cae3e10427e479bba40c77312b63e1f9665e0737d/orjson-3.11.5-cp312-cp312-musllinux_1_2_i686.whl", hash = "sha256:e8b5f96c05fce7d0218df3fdfeb962d6b8cfff7e3e20264306b46dd8b217c0f3", size = 151530, upload-time = "2025-12-06T15:54:34.6Z" },
229
- { url = "https://files.pythonhosted.org/packages/f9/d4/f9ebc57182705bb4bbe63f5bbe14af43722a2533135e1d2fb7affa0c355d/orjson-3.11.5-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:ddbfdb5099b3e6ba6d6ea818f61997bb66de14b411357d24c4612cf1ebad08ca", size = 141863, upload-time = "2025-12-06T15:54:35.801Z" },
230
- { url = "https://files.pythonhosted.org/packages/0d/04/02102b8d19fdcb009d72d622bb5781e8f3fae1646bf3e18c53d1bc8115b5/orjson-3.11.5-cp312-cp312-win32.whl", hash = "sha256:9172578c4eb09dbfcf1657d43198de59b6cef4054de385365060ed50c458ac98", size = 135255, upload-time = "2025-12-06T15:54:37.209Z" },
231
- { url = "https://files.pythonhosted.org/packages/d4/fb/f05646c43d5450492cb387de5549f6de90a71001682c17882d9f66476af5/orjson-3.11.5-cp312-cp312-win_amd64.whl", hash = "sha256:2b91126e7b470ff2e75746f6f6ee32b9ab67b7a93c8ba1d15d3a0caaf16ec875", size = 133252, upload-time = "2025-12-06T15:54:38.401Z" },
232
- { url = "https://files.pythonhosted.org/packages/dc/a6/7b8c0b26ba18c793533ac1cd145e131e46fcf43952aa94c109b5b913c1f0/orjson-3.11.5-cp312-cp312-win_arm64.whl", hash = "sha256:acbc5fac7e06777555b0722b8ad5f574739e99ffe99467ed63da98f97f9ca0fe", size = 126777, upload-time = "2025-12-06T15:54:39.515Z" },
233
  ]
234
 
235
  [[package]]
 
106
  [package.metadata]
107
  requires-dist = [
108
  { name = "fastapi", specifier = ">=0.128.0" },
109
+ { name = "gemini-webapi", specifier = ">=1.18.0" },
110
  { name = "lmdb", specifier = ">=1.7.5" },
111
  { name = "loguru", specifier = ">=0.7.3" },
112
+ { name = "orjson", specifier = ">=3.11.7" },
113
  { name = "pydantic-settings", extras = ["yaml"], specifier = ">=2.12.0" },
114
  { name = "ruff", marker = "extra == 'dev'", specifier = ">=0.14.14" },
115
  { name = "uvicorn", specifier = ">=0.40.0" },
 
122
 
123
  [[package]]
124
  name = "gemini-webapi"
125
+ version = "1.18.0"
126
  source = { registry = "https://pypi.org/simple" }
127
  dependencies = [
128
+ { name = "httpx", extra = ["http2"] },
129
  { name = "loguru" },
130
  { name = "orjson" },
131
  { name = "pydantic" },
132
  ]
133
+ sdist = { url = "https://files.pythonhosted.org/packages/c6/03/eb06536f287a8b7fb4808b00a60d9a9a3694f8a4079b77730325c639fbbe/gemini_webapi-1.18.0.tar.gz", hash = "sha256:0688a080fc3c95be55e723a66b2b69ec3ffcd58b07c50cf627d85d59d1181a86", size = 264630, upload-time = "2026-02-03T01:18:39.794Z" }
134
  wheels = [
135
+ { url = "https://files.pythonhosted.org/packages/40/33/85f520f56faddd68442c7efe7086ff5593b213bd8fc3768835dbe610fd9b/gemini_webapi-1.18.0-py3-none-any.whl", hash = "sha256:2fe25b5f8185aba1ca109e1280ef3eb79e5bd8a81fba16e01fbc4a177b72362c", size = 61523, upload-time = "2026-02-03T01:18:38.322Z" },
136
  ]
137
 
138
  [[package]]
 
144
  { url = "https://files.pythonhosted.org/packages/04/4b/29cac41a4d98d144bf5f6d33995617b185d14b22401f75ca86f384e87ff1/h11-0.16.0-py3-none-any.whl", hash = "sha256:63cf8bbe7522de3bf65932fda1d9c2772064ffb3dae62d55932da54b31cb6c86", size = 37515, upload-time = "2025-04-24T03:35:24.344Z" },
145
  ]
146
 
147
+ [[package]]
148
+ name = "h2"
149
+ version = "4.3.0"
150
+ source = { registry = "https://pypi.org/simple" }
151
+ dependencies = [
152
+ { name = "hpack" },
153
+ { name = "hyperframe" },
154
+ ]
155
+ sdist = { url = "https://files.pythonhosted.org/packages/1d/17/afa56379f94ad0fe8defd37d6eb3f89a25404ffc71d4d848893d270325fc/h2-4.3.0.tar.gz", hash = "sha256:6c59efe4323fa18b47a632221a1888bd7fde6249819beda254aeca909f221bf1", size = 2152026, upload-time = "2025-08-23T18:12:19.778Z" }
156
+ wheels = [
157
+ { url = "https://files.pythonhosted.org/packages/69/b2/119f6e6dcbd96f9069ce9a2665e0146588dc9f88f29549711853645e736a/h2-4.3.0-py3-none-any.whl", hash = "sha256:c438f029a25f7945c69e0ccf0fb951dc3f73a5f6412981daee861431b70e2bdd", size = 61779, upload-time = "2025-08-23T18:12:17.779Z" },
158
+ ]
159
+
160
+ [[package]]
161
+ name = "hpack"
162
+ version = "4.1.0"
163
+ source = { registry = "https://pypi.org/simple" }
164
+ sdist = { url = "https://files.pythonhosted.org/packages/2c/48/71de9ed269fdae9c8057e5a4c0aa7402e8bb16f2c6e90b3aa53327b113f8/hpack-4.1.0.tar.gz", hash = "sha256:ec5eca154f7056aa06f196a557655c5b009b382873ac8d1e66e79e87535f1dca", size = 51276, upload-time = "2025-01-22T21:44:58.347Z" }
165
+ wheels = [
166
+ { url = "https://files.pythonhosted.org/packages/07/c6/80c95b1b2b94682a72cbdbfb85b81ae2daffa4291fbfa1b1464502ede10d/hpack-4.1.0-py3-none-any.whl", hash = "sha256:157ac792668d995c657d93111f46b4535ed114f0c9c8d672271bbec7eae1b496", size = 34357, upload-time = "2025-01-22T21:44:56.92Z" },
167
+ ]
168
+
169
  [[package]]
170
  name = "httpcore"
171
  version = "1.0.9"
 
194
  { url = "https://files.pythonhosted.org/packages/2a/39/e50c7c3a983047577ee07d2a9e53faf5a69493943ec3f6a384bdc792deb2/httpx-0.28.1-py3-none-any.whl", hash = "sha256:d909fcccc110f8c7faf814ca82a9a4d816bc5a6dbfea25d6591d6985b8ba59ad", size = 73517, upload-time = "2024-12-06T15:37:21.509Z" },
195
  ]
196
 
197
+ [package.optional-dependencies]
198
+ http2 = [
199
+ { name = "h2" },
200
+ ]
201
+
202
+ [[package]]
203
+ name = "hyperframe"
204
+ version = "6.1.0"
205
+ source = { registry = "https://pypi.org/simple" }
206
+ sdist = { url = "https://files.pythonhosted.org/packages/02/e7/94f8232d4a74cc99514c13a9f995811485a6903d48e5d952771ef6322e30/hyperframe-6.1.0.tar.gz", hash = "sha256:f630908a00854a7adeabd6382b43923a4c4cd4b821fcb527e6ab9e15382a3b08", size = 26566, upload-time = "2025-01-22T21:41:49.302Z" }
207
+ wheels = [
208
+ { url = "https://files.pythonhosted.org/packages/48/30/47d0bf6072f7252e6521f3447ccfa40b421b6824517f82854703d0f5a98b/hyperframe-6.1.0-py3-none-any.whl", hash = "sha256:b03380493a519fce58ea5af42e4a42317bf9bd425596f7a0835ffce80f1a42e5", size = 13007, upload-time = "2025-01-22T21:41:47.295Z" },
209
+ ]
210
+
211
  [[package]]
212
  name = "idna"
213
  version = "3.11"
 
247
 
248
  [[package]]
249
  name = "orjson"
250
+ version = "3.11.7"
251
  source = { registry = "https://pypi.org/simple" }
252
+ sdist = { url = "https://files.pythonhosted.org/packages/53/45/b268004f745ede84e5798b48ee12b05129d19235d0e15267aa57dcdb400b/orjson-3.11.7.tar.gz", hash = "sha256:9b1a67243945819ce55d24a30b59d6a168e86220452d2c96f4d1f093e71c0c49", size = 6144992, upload-time = "2026-02-02T15:38:49.29Z" }
253
  wheels = [
254
+ { url = "https://files.pythonhosted.org/packages/80/bf/76f4f1665f6983385938f0e2a5d7efa12a58171b8456c252f3bae8a4cf75/orjson-3.11.7-cp312-cp312-macosx_10_15_x86_64.macosx_11_0_arm64.macosx_10_15_universal2.whl", hash = "sha256:bd03ea7606833655048dab1a00734a2875e3e86c276e1d772b2a02556f0d895f", size = 228545, upload-time = "2026-02-02T15:37:46.376Z" },
255
+ { url = "https://files.pythonhosted.org/packages/79/53/6c72c002cb13b5a978a068add59b25a8bdf2800ac1c9c8ecdb26d6d97064/orjson-3.11.7-cp312-cp312-macosx_15_0_arm64.whl", hash = "sha256:89e440ebc74ce8ab5c7bc4ce6757b4a6b1041becb127df818f6997b5c71aa60b", size = 125224, upload-time = "2026-02-02T15:37:47.697Z" },
256
+ { url = "https://files.pythonhosted.org/packages/2c/83/10e48852865e5dd151bdfe652c06f7da484578ed02c5fca938e3632cb0b8/orjson-3.11.7-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:5ede977b5fe5ac91b1dffc0a517ca4542d2ec8a6a4ff7b2652d94f640796342a", size = 128154, upload-time = "2026-02-02T15:37:48.954Z" },
257
+ { url = "https://files.pythonhosted.org/packages/6e/52/a66e22a2b9abaa374b4a081d410edab6d1e30024707b87eab7c734afe28d/orjson-3.11.7-cp312-cp312-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:b7b1dae39230a393df353827c855a5f176271c23434cfd2db74e0e424e693e10", size = 123548, upload-time = "2026-02-02T15:37:50.187Z" },
258
+ { url = "https://files.pythonhosted.org/packages/de/38/605d371417021359f4910c496f764c48ceb8997605f8c25bf1dfe58c0ebe/orjson-3.11.7-cp312-cp312-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:ed46f17096e28fb28d2975834836a639af7278aa87c84f68ab08fbe5b8bd75fa", size = 129000, upload-time = "2026-02-02T15:37:51.426Z" },
259
+ { url = "https://files.pythonhosted.org/packages/44/98/af32e842b0ffd2335c89714d48ca4e3917b42f5d6ee5537832e069a4b3ac/orjson-3.11.7-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:3726be79e36e526e3d9c1aceaadbfb4a04ee80a72ab47b3f3c17fefb9812e7b8", size = 141686, upload-time = "2026-02-02T15:37:52.607Z" },
260
+ { url = "https://files.pythonhosted.org/packages/96/0b/fc793858dfa54be6feee940c1463370ece34b3c39c1ca0aa3845f5ba9892/orjson-3.11.7-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:0724e265bc548af1dedebd9cb3d24b4e1c1e685a343be43e87ba922a5c5fff2f", size = 130812, upload-time = "2026-02-02T15:37:53.944Z" },
261
+ { url = "https://files.pythonhosted.org/packages/dc/91/98a52415059db3f374757d0b7f0f16e3b5cd5976c90d1c2b56acaea039e6/orjson-3.11.7-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:e7745312efa9e11c17fbd3cb3097262d079da26930ae9ae7ba28fb738367cbad", size = 133440, upload-time = "2026-02-02T15:37:55.615Z" },
262
+ { url = "https://files.pythonhosted.org/packages/dc/b6/cb540117bda61791f46381f8c26c8f93e802892830a6055748d3bb1925ab/orjson-3.11.7-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:f904c24bdeabd4298f7a977ef14ca2a022ca921ed670b92ecd16ab6f3d01f867", size = 138386, upload-time = "2026-02-02T15:37:56.814Z" },
263
+ { url = "https://files.pythonhosted.org/packages/63/1a/50a3201c334a7f17c231eee5f841342190723794e3b06293f26e7cf87d31/orjson-3.11.7-cp312-cp312-musllinux_1_2_armv7l.whl", hash = "sha256:b9fc4d0f81f394689e0814617aadc4f2ea0e8025f38c226cbf22d3b5ddbf025d", size = 408853, upload-time = "2026-02-02T15:37:58.291Z" },
264
+ { url = "https://files.pythonhosted.org/packages/87/cd/8de1c67d0be44fdc22701e5989c0d015a2adf391498ad42c4dc589cd3013/orjson-3.11.7-cp312-cp312-musllinux_1_2_i686.whl", hash = "sha256:849e38203e5be40b776ed2718e587faf204d184fc9a008ae441f9442320c0cab", size = 144130, upload-time = "2026-02-02T15:38:00.163Z" },
265
+ { url = "https://files.pythonhosted.org/packages/0f/fe/d605d700c35dd55f51710d159fc54516a280923cd1b7e47508982fbb387d/orjson-3.11.7-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:4682d1db3bcebd2b64757e0ddf9e87ae5f00d29d16c5cdf3a62f561d08cc3dd2", size = 134818, upload-time = "2026-02-02T15:38:01.507Z" },
266
+ { url = "https://files.pythonhosted.org/packages/e4/e4/15ecc67edb3ddb3e2f46ae04475f2d294e8b60c1825fbe28a428b93b3fbd/orjson-3.11.7-cp312-cp312-win32.whl", hash = "sha256:f4f7c956b5215d949a1f65334cf9d7612dde38f20a95f2315deef167def91a6f", size = 127923, upload-time = "2026-02-02T15:38:02.75Z" },
267
+ { url = "https://files.pythonhosted.org/packages/34/70/2e0855361f76198a3965273048c8e50a9695d88cd75811a5b46444895845/orjson-3.11.7-cp312-cp312-win_amd64.whl", hash = "sha256:bf742e149121dc5648ba0a08ea0871e87b660467ef168a3a5e53bc1fbd64bb74", size = 125007, upload-time = "2026-02-02T15:38:04.032Z" },
268
+ { url = "https://files.pythonhosted.org/packages/68/40/c2051bd19fc467610fed469dc29e43ac65891571138f476834ca192bc290/orjson-3.11.7-cp312-cp312-win_arm64.whl", hash = "sha256:26c3b9132f783b7d7903bf1efb095fed8d4a3a85ec0d334ee8beff3d7a4749d5", size = 126089, upload-time = "2026-02-02T15:38:05.297Z" },
269
  ]
270
 
271
  [[package]]