Spaces:

superxu520
/

G_AI

Paused

Lưu Quang Vũ commited on Feb 3

Commit

7db4283

unverified ·

1 Parent(s): f8272eb

:sparkles: Enable real-time streaming responses and completely solve the issue with reusable sessions. (#95)

* Remove the unused auto-refresh functionality and related imports.

They are no longer needed since the underlying library issue has been resolved.

* Enhance error handling in client initialization and message sending

* Refactor link handling to extract file paths and simplify Google search links

* Fix regex pattern for Google search link matching

* Fix regex patterns for Markdown escaping, code fence and Google search link matching

* Increase timeout value in configuration files from 60 to 120 seconds to better handle heavy tasks

* Fix Image generation

* Refactor tool handling to support standard and image generation tools separately

* Fix: use "ascii" decoding for base64-encoded image data consistency

* Fix: replace `running` with `_running` for internal client status checks

* Refactor: replace direct `_running` access with `running()` method in client status checks

* Extend models with new fields for annotations, reasoning, audio, log probabilities, and token details; adjust response handling accordingly.

* Extend models with new fields (annotations, error), add `normalize_output_text` validator, rename `created` to `created_at`, and update response handling accordingly.

* Extend response models to support tool choices, image output, and improved streaming of response items. Refactor image generation handling for consistency and add compatibility with output content.

* Set default `text` value to an empty string for `ResponseOutputContent` and ensure consistent initialization in image output handling.

* feat: Add /images endpoint with dedicated router and improved image management

Add dedicated router for /images endpoint and refactor image handling logic for better modularity. Enhance temporary image management with secure naming, token verification, and cleanup functionality.

* feat: Add token-based verification for image access

* Refactor: rename image store directory to `ai_generated_images` for clarity

* fix: Update create_response to use FastAPI Request object for base_url and refactor variable handling

* fix: Correct attribute access in request_data handling within `chat.py` for tools, tool_choice, and streaming settings

* fix: Save generated images to persistent storage

* fix: Remove unused `output_image` type from `ResponseOutputContent` and update response handling for consistency

* fix: Update image URL generation in chat response to use Markdown format for compatibility

* fix: Enhance error handling for full-size image saving and add fallback to default size

* fix: Use filename as image ID to ensure consistency in generated image handling

* fix: Enhance tempfile saving by adding custom headers, content-type handling, and improved extension determination

* feat: Add support for custom Gemini models and model loading strategies

- Introduced `model_strategy` configuration for "append" (default + custom models) or "overwrite" (custom models only).
- Enhanced `/v1/models` endpoint to return models based on the configured strategy.
- Improved model loading with environment variable overrides and validation.
- Refactored model handling logic for improved modularity and error handling.

* feat: Improve Gemini model environment variable parsing and nested field support

- Enhanced `extract_gemini_models_env` to handle nested fields within environment variables.
- Updated type hints for more flexibility in model overrides.
- Improved `_merge_models_with_env` to better support field-level updates and appending new models.

* refactor: Consolidate utility functions and clean up unused code

- Moved utility functions like `strip_code_fence`, `extract_tool_calls`, and `iter_stream_segments` to a centralized helper module.
- Removed unused and redundant private methods from `chat.py`, including `_strip_code_fence`, `_strip_tagged_blocks`, and `_strip_system_hints`.
- Updated imports and references across modules for consistency.
- Simplified tool call and streaming logic by replacing inline implementations with shared helper functions.

* fix: Handle None input in `estimate_tokens` and return 0 for empty text

* refactor: Simplify model configuration and add JSON parsing validators

- Replaced unused model placeholder in `config.yaml` with an empty list.
- Added JSON parsing validators for `model_header` and `models` to enhance flexibility and error handling.
- Improved validation to filter out incomplete model configurations.

* refactor: Simplify Gemini model environment variable parsing with JSON support

- Replaced prefix-based parsing with a root key approach.
- Added JSON parsing to handle list-based model configurations.
- Improved handling of errors and cleanup of environment variables.

* fix: Enhance Gemini model environment variable parsing with fallback to Python literals

- Added `ast.literal_eval` as a fallback for parsing environment variables when JSON decoding fails.
- Improved error handling and logging for invalid configurations.
- Ensured proper cleanup of environment variables post-parsing.

* fix: Improve regex patterns in helper module

- Adjusted `TOOL_CALL_RE` regex pattern for better accuracy.

* docs: Update README files to include custom model configuration and environment variable setup

* fix: Remove unused headers from HTTP client in helper module

* fix: Update README and README.zh to clarify model configuration via environment variables; enhance error logging in config validation

* Update README and README.zh to clarify model configuration via JSON string or list structure for enhanced flexibility in automated environments

* Refactor: compress JSON content to save tokens and streamline sending multiple chunks

* Refactor: Modify the LMDB store to fix issues where no conversation is found in either the raw or cleaned history.

* Refactor: Modify the LMDB store to fix issues where no conversation is found.

* Refactor: Update all functions to use orjson for better performance

* Update project dependencies

* Fix IDE warnings

* Incorrect IDE warnings

* Refactor: Modify the LMDB store to fix issues where no conversation is found.

* Refactor: Centralized the mapping of the 'developer' role to 'system' for better Gemini compatibility.

* Refactor: Modify the LMDB store to fix issues where no conversation is found.

* Refactor: Modify the LMDB store to fix issues where no conversation is found.

* Refactor: Modify the LMDB store to fix issues where no conversation is found.

* Refactor: Avoid reusing an existing chat session if its idle time exceeds METADATA_TTL_MINUTES.

* Refactor: Update the LMDB store to resolve issues preventing conversation from being saved

* Refactor: Update the _prepare_messages_for_model helper to omit the system instruction when reusing a session to save tokens.

* Refactor: Modify the logic to convert a large prompt into a temporary text file attachment

- When multiple chunks are sent simultaneously, Google will immediately invalidate the access token and reject the request
- When a prompt contains a structured format like JSON, splitting it can break the format and may cause the model to misunderstand the context
- Another minor tweak as Copilot suggested

* Enable streaming responses and fully resolve the problem with reusable sessions.

- Ensure that PR https://github.com/HanaokaYuzu/Gemini-API/pull/220 is merged before proceeding with this PR.

* Enable real-time streaming responses and completely solve the issue with reusable sessions.

- Ensure that PR https://github.com/HanaokaYuzu/Gemini-API/pull/220 is merged before proceeding with this PR.
- Introducing a new feature for real-time streaming responses.
- Fully resolve the problem with reusable sessions.
- Break down similar flow logic into helper functions.
- All endpoints now support inline Markdown images.
- Switch large prompts to use BytesIO to avoid reading and writing to disk.

* Enable real-time streaming responses and completely solve the issue with reusable sessions.

- Ensure that PR https://github.com/HanaokaYuzu/Gemini-API/pull/220 is merged before proceeding with this PR.
- Introducing a new feature for real-time streaming responses.
- Fully resolve the problem with reusable sessions.
- Break down similar flow logic into helper functions.
- All endpoints now support inline Markdown images.
- Switch large prompts to use BytesIO to avoid reading and writing to disk.
- Remove duplicate images when saving and responding.

* Enable real-time streaming responses and completely solve the issue with reusable sessions.

- Ensure that PR https://github.com/HanaokaYuzu/Gemini-API/pull/220 is merged before proceeding with this PR.
- Introducing a new feature for real-time streaming responses.
- Fully resolve the problem with reusable sessions.
- Break down similar flow logic into helper functions.
- All endpoints now support inline Markdown images.
- Switch large prompts to use BytesIO to avoid reading and writing to disk.
- Remove duplicate images when saving and responding.

* Enable real-time streaming responses and completely solve the issue with reusable sessions.

- Ensure that PR https://github.com/HanaokaYuzu/Gemini-API/pull/220 is merged before proceeding with this PR.
- Introducing a new feature for real-time streaming responses.
- Fully resolve the problem with reusable sessions.
- Break down similar flow logic into helper functions.
- All endpoints now support inline Markdown images.
- Switch large prompts to use BytesIO to avoid reading and writing to disk.
- Remove duplicate images when saving and responding.

* Enable real-time streaming responses and completely solve the issue with reusable sessions.

- Ensure that PR https://github.com/HanaokaYuzu/Gemini-API/pull/220 is merged before proceeding with this PR.
- Introducing a new feature for real-time streaming responses.
- Fully resolve the problem with reusable sessions.
- Break down similar flow logic into helper functions.
- All endpoints now support inline Markdown images.
- Switch large prompts to use BytesIO to avoid reading and writing to disk.
- Remove duplicate images when saving and responding.

* build: update

Files changed (9) hide show

app/main.py +1 -1
app/models/models.py +2 -2
app/server/chat.py +1076 -840
app/services/client.py +20 -35
app/services/lmdb.py +83 -53
app/services/pool.py +2 -2
app/utils/helper.py +28 -91
pyproject.toml +2 -2
uv.lock +59 -23

app/main.py CHANGED Viewed

@@ -15,7 +15,7 @@ from .server.middleware import (
 )
 from .services import GeminiClientPool, LMDBConversationStore
-RETENTION_CLEANUP_INTERVAL_SECONDS = 6 * 60 * 60  # 6 hours
 async def _run_retention_cleanup(stop_event: asyncio.Event) -> None:

 )
 from .services import GeminiClientPool, LMDBConversationStore
+RETENTION_CLEANUP_INTERVAL_SECONDS = 6 * 60 * 60  # Check every 6 hours
 async def _run_retention_cleanup(stop_event: asyncio.Event) -> None:

app/models/models.py CHANGED Viewed

@@ -7,7 +7,7 @@ from pydantic import BaseModel, Field, model_validator
 class ContentItem(BaseModel):
-    """Content item model"""
     type: Literal["text", "image_url", "file", "input_audio"]
     text: Optional[str] = None
@@ -159,7 +159,7 @@ class ConversationInStore(BaseModel):
     created_at: Optional[datetime] = Field(default=None)
     updated_at: Optional[datetime] = Field(default=None)
-    # NOTE: Gemini Web API do not support changing models once a conversation is created.
     model: str = Field(..., description="Model used for the conversation")
     client_id: str = Field(..., description="Identifier of the Gemini client")
     metadata: list[str | None] = Field(

 class ContentItem(BaseModel):
+    """Individual content item (text, image, or file) within a message."""
     type: Literal["text", "image_url", "file", "input_audio"]
     text: Optional[str] = None
     created_at: Optional[datetime] = Field(default=None)
     updated_at: Optional[datetime] = Field(default=None)
+    # Gemini Web API does not support changing models once a conversation is created.
     model: str = Field(..., description="Model used for the conversation")
     client_id: str = Field(..., description="Identifier of the Gemini client")
     metadata: list[str | None] = Field(

app/server/chat.py CHANGED Viewed

@@ -1,18 +1,19 @@
 import base64
-import re
-import tempfile
 import uuid
 from dataclasses import dataclass
 from datetime import datetime, timezone
 from pathlib import Path
-from typing import Any
 import orjson
 from fastapi import APIRouter, Depends, HTTPException, Request, status
 from fastapi.responses import StreamingResponse
 from gemini_webapi.client import ChatSession
 from gemini_webapi.constants import Model
-from gemini_webapi.exceptions import APIError
 from gemini_webapi.types.image import GeneratedImage, Image
 from loguru import logger
@@ -42,21 +43,18 @@ from ..utils import g_config
 from ..utils.helper import (
     CODE_BLOCK_HINT,
     CODE_HINT_STRIPPED,
     XML_HINT_STRIPPED,
     XML_WRAP_HINT,
     estimate_tokens,
     extract_image_dimensions,
     extract_tool_calls,
-    iter_stream_segments,
-    remove_tool_call_blocks,
     strip_code_fence,
     text_from_message,
 )
 from .middleware import get_image_store_dir, get_image_token, get_temp_dir, verify_api_key
-# Maximum characters Gemini Web can accept in a single request (configurable)
 MAX_CHARS_PER_REQUEST = int(g_config.gemini.max_chars_per_request * 0.9)
-CONTINUATION_HINT = "\n(More messages to come, please reply with just 'ok.')"
 METADATA_TTL_MINUTES = 15
 router = APIRouter()
@@ -72,6 +70,212 @@ class StructuredOutputRequirement:
     raw_format: dict[str, Any]
 def _build_structured_requirement(
     response_format: dict[str, Any] | None,
 ) -> StructuredOutputRequirement | None:
@@ -80,17 +284,23 @@ def _build_structured_requirement(
         return None
     if response_format.get("type") != "json_schema":
-        logger.warning(f"Unsupported response_format type requested: {response_format}")
         return None
     json_schema = response_format.get("json_schema")
     if not isinstance(json_schema, dict):
-        logger.warning(f"Invalid json_schema payload in response_format: {response_format}")
         return None
     schema = json_schema.get("schema")
     if not isinstance(schema, dict):
-        logger.warning(f"Missing `schema` object in response_format payload: {response_format}")
         return None
     schema_name = json_schema.get("name") or "response"
@@ -136,7 +346,9 @@ def _build_tool_prompt(
         description = function.description or "No description provided."
         lines.append(f"Tool `{function.name}`: {description}")
         if function.parameters:
-            schema_text = orjson.dumps(function.parameters).decode("utf-8")
             lines.append("Arguments JSON schema:")
             lines.append(schema_text)
         else:
@@ -155,7 +367,6 @@ def _build_tool_prompt(
         lines.append(
             f"You are required to call the tool named `{target}`. Do not call any other tool."
         )
-    # `auto` or None fall back to default instructions.
     lines.append(
         "When you decide to call a tool you MUST respond with nothing except a single fenced block exactly like the template below."
@@ -221,7 +432,7 @@ def _append_xml_hint_to_last_user_message(messages: list[Message]) -> None:
         if isinstance(msg.content, str):
             if XML_HINT_STRIPPED not in msg.content:
-                msg.content = f"{msg.content}{XML_WRAP_HINT}"
             return
         if isinstance(msg.content, list):
@@ -231,15 +442,13 @@ def _append_xml_hint_to_last_user_message(messages: list[Message]) -> None:
                 text_value = part.text or ""
                 if XML_HINT_STRIPPED in text_value:
                     return
-                part.text = f"{text_value}{XML_WRAP_HINT}"
                 return
             messages_text = XML_WRAP_HINT.strip()
             msg.content.append(ContentItem(type="text", text=messages_text))
             return
-    # No user message to annotate; nothing to do.
 def _conversation_has_code_hint(messages: list[Message]) -> bool:
     """Return True if any system message already includes the code block hint."""
@@ -272,6 +481,17 @@ def _prepare_messages_for_model(
     """Return a copy of messages enriched with tool instructions when needed."""
     prepared = [msg.model_copy(deep=True) for msg in source_messages]
     instructions: list[str] = []
     if inject_system_defaults:
         if tools:
@@ -290,7 +510,6 @@ def _prepare_messages_for_model(
             logger.debug("Injected default code block hint for Gemini conversation.")
     if not instructions:
-        # Still need to ensure XML hint for the last user message if tools are present
         if tools and tool_choice != "none":
             _append_xml_hint_to_last_user_message(prepared)
         return prepared
@@ -323,7 +542,6 @@ def _response_items_to_messages(
     normalized_input: list[ResponseInputItem] = []
     for item in items:
         role = item.role
         content = item.content
         normalized_contents: list[ResponseInputContent] = []
         if isinstance(content, str):
@@ -394,7 +612,6 @@ def _instructions_to_messages(
             continue
         role = item.role
         content = item.content
         if isinstance(content, str):
             instruction_messages.append(Message(role=role, content=content))
@@ -432,10 +649,7 @@ def _instructions_to_messages(
 def _get_model_by_name(name: str) -> Model:
-    """
-    Retrieve a Model instance by name, considering custom models from config
-    and the update strategy (append or overwrite).
-    """
     strategy = g_config.gemini.model_strategy
     custom_models = {m.model_name: m for m in g_config.gemini.models if m.model_name}
@@ -449,9 +663,7 @@ def _get_model_by_name(name: str) -> Model:
 def _get_available_models() -> list[ModelData]:
-    """
-    Return a list of available models based on configuration strategy.
-    """
     now = int(datetime.now(tz=timezone.utc).timestamp())
     strategy = g_config.gemini.model_strategy
     models_data = []
@@ -486,910 +698,934 @@ def _get_available_models() -> list[ModelData]:
     return models_data
-@router.get("/v1/models", response_model=ModelListResponse)
-async def list_models(api_key: str = Depends(verify_api_key)):
-    models = _get_available_models()
-    return ModelListResponse(data=models)
-@router.post("/v1/chat/completions")
-async def create_chat_completion(
-    request: ChatCompletionRequest,
-    api_key: str = Depends(verify_api_key),
-    tmp_dir: Path = Depends(get_temp_dir),
-    image_store: Path = Depends(get_image_store_dir),
-):
-    pool = GeminiClientPool()
-    db = LMDBConversationStore()
-    try:
-        model = _get_model_by_name(request.model)
-    except ValueError as exc:
-        raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=str(exc)) from exc
-    if len(request.messages) == 0:
-        raise HTTPException(
-            status_code=status.HTTP_400_BAD_REQUEST,
-            detail="At least one message is required in the conversation.",
-        )
-    structured_requirement = _build_structured_requirement(request.response_format)
-    if structured_requirement and request.stream:
-        logger.debug(
-            "Structured response requested with streaming enabled; will stream canonical JSON once ready."
-        )
-    if structured_requirement:
-        logger.debug(
-            f"Structured response requested for /v1/chat/completions (schema={structured_requirement.schema_name})."
-        )
-    extra_instructions = [structured_requirement.instruction] if structured_requirement else None
-    # Check if conversation is reusable
-    session, client, remaining_messages = await _find_reusable_session(
-        db, pool, model, request.messages
-    )
-    if session:
-        # Optimization: When reusing a session, we don't need to resend the heavy tool definitions
-        # or structured output instructions as they are already in the Gemini session history.
-        messages_to_send = _prepare_messages_for_model(
-            remaining_messages,
-            request.tools,
-            request.tool_choice,
-            extra_instructions,
-            inject_system_defaults=False,
-        )
-        if not messages_to_send:
-            raise HTTPException(
-                status_code=status.HTTP_400_BAD_REQUEST,
-                detail="No new messages to send for the existing session.",
-            )
-        if len(messages_to_send) == 1:
-            model_input, files = await GeminiClientWrapper.process_message(
-                messages_to_send[0], tmp_dir, tagged=False
-            )
-        else:
-            model_input, files = await GeminiClientWrapper.process_conversation(
-                messages_to_send, tmp_dir
-            )
-        logger.debug(
-            f"Reused session {session.metadata} - sending {len(messages_to_send)} prepared messages."
-        )
-    else:
-        # Start a new session and concat messages into a single string
         try:
-            client = await pool.acquire()
-            session = client.start_chat(model=model)
-            messages_to_send = _prepare_messages_for_model(
-                request.messages, request.tools, request.tool_choice, extra_instructions
-            )
-            model_input, files = await GeminiClientWrapper.process_conversation(
-                messages_to_send, tmp_dir
-            )
-        except ValueError as e:
-            raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=str(e))
-        except RuntimeError as e:
-            raise HTTPException(status_code=status.HTTP_503_SERVICE_UNAVAILABLE, detail=str(e))
         except Exception as e:
-            logger.exception(f"Error in preparing conversation: {e}")
             raise
-        logger.debug("New session started.")
-    # Generate response
     try:
-        assert session and client, "Session and client not available"
-        client_id = client.id
-        logger.debug(
-            f"Client ID: {client_id}, Input length: {len(model_input)}, files count: {len(files)}"
         )
-        response = await _send_with_split(session, model_input, files=files)
-    except APIError as exc:
-        client_id = client.id if client else "unknown"
-        logger.warning(f"Gemini API returned invalid response for client {client_id}: {exc}")
-        raise HTTPException(
-            status_code=status.HTTP_503_SERVICE_UNAVAILABLE,
-            detail="Gemini temporarily returned an invalid response. Please retry.",
-        ) from exc
-    except HTTPException:
-        raise
     except Exception as e:
-        logger.exception(f"Unexpected error generating content from Gemini API: {e}")
-        raise HTTPException(
-            status_code=status.HTTP_502_BAD_GATEWAY,
-            detail="Gemini returned an unexpected error.",
-        ) from e
-    # Format the response from API
-    try:
-        raw_output_with_think = GeminiClientWrapper.extract_output(response, include_thoughts=True)
-        raw_output_clean = GeminiClientWrapper.extract_output(response, include_thoughts=False)
-    except IndexError as exc:
-        logger.exception("Gemini output parsing failed (IndexError).")
-        raise HTTPException(
-            status_code=status.HTTP_502_BAD_GATEWAY,
-            detail="Gemini returned malformed response content.",
-        ) from exc
-    except Exception as exc:
-        logger.exception("Gemini output parsing failed unexpectedly.")
-        raise HTTPException(
-            status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
-            detail="Gemini output parsing failed unexpectedly.",
-        ) from exc
-    visible_output, tool_calls = extract_tool_calls(raw_output_with_think)
-    storage_output = remove_tool_call_blocks(raw_output_clean).strip()
-    tool_calls_payload = [call.model_dump(mode="json") for call in tool_calls]
-    if structured_requirement:
-        cleaned_visible = strip_code_fence(visible_output or "")
-        if not cleaned_visible:
-            raise HTTPException(
-                status_code=status.HTTP_502_BAD_GATEWAY,
-                detail="LLM returned an empty response while JSON schema output was requested.",
-            )
-        try:
-            structured_payload = orjson.loads(cleaned_visible)
-        except orjson.JSONDecodeError as exc:
-            logger.warning(
-                f"Failed to decode JSON for structured response (schema={structured_requirement.schema_name}): "
-                f"{cleaned_visible}"
-            )
-            raise HTTPException(
-                status_code=status.HTTP_502_BAD_GATEWAY,
-                detail="LLM returned invalid JSON for the requested response_format.",
-            ) from exc
-        canonical_output = orjson.dumps(structured_payload).decode("utf-8")
-        visible_output = canonical_output
-        storage_output = canonical_output
-    if tool_calls_payload:
-        logger.debug(f"Detected tool calls: {tool_calls_payload}")
-    # After formatting, persist the conversation to LMDB
-    try:
-        current_assistant_message = Message(
-            role="assistant",
-            content=storage_output or None,
-            tool_calls=tool_calls or None,
-        )
-        # Sanitize the entire history including the new message to ensure consistency
-        full_history = [*request.messages, current_assistant_message]
-        cleaned_history = db.sanitize_assistant_messages(full_history)
-        conv = ConversationInStore(
-            model=model.model_name,
-            client_id=client.id,
-            metadata=session.metadata,
-            messages=cleaned_history,
-        )
-        key = db.store(conv)
-        logger.debug(f"Conversation saved to LMDB with key: {key}")
-    except Exception as e:
-        # We can still return the response even if saving fails
-        logger.warning(f"Failed to save conversation to LMDB: {e}")
-    # Return with streaming or standard response
-    completion_id = f"chatcmpl-{uuid.uuid4()}"
-    timestamp = int(datetime.now(tz=timezone.utc).timestamp())
-    if request.stream:
-        return _create_streaming_response(
-            visible_output,
-            tool_calls_payload,
-            completion_id,
-            timestamp,
-            request.model,
-            request.messages,
-        )
-    else:
-        return _create_standard_response(
-            visible_output,
-            tool_calls_payload,
-            completion_id,
-            timestamp,
-            request.model,
-            request.messages,
-        )
-@router.post("/v1/responses")
-async def create_response(
-    request_data: ResponseCreateRequest,
-    request: Request,
-    api_key: str = Depends(verify_api_key),
-    tmp_dir: Path = Depends(get_temp_dir),
-    image_store: Path = Depends(get_image_store_dir),
-):
-    base_messages, normalized_input = _response_items_to_messages(request_data.input)
-    structured_requirement = _build_structured_requirement(request_data.response_format)
-    if structured_requirement and request_data.stream:
-        logger.debug(
-            "Structured response requested with streaming enabled; streaming not supported for Responses."
-        )
-    extra_instructions: list[str] = []
-    if structured_requirement:
-        extra_instructions.append(structured_requirement.instruction)
-        logger.debug(
-            f"Structured response requested for /v1/responses (schema={structured_requirement.schema_name})."
-        )
-    # Separate standard tools from image generation tools
-    standard_tools: list[Tool] = []
-    image_tools: list[ResponseImageTool] = []
-    if request_data.tools:
-        for t in request_data.tools:
-            if isinstance(t, Tool):
-                standard_tools.append(t)
-            elif isinstance(t, ResponseImageTool):
-                image_tools.append(t)
-            # Handle dicts if Pydantic didn't convert them fully (fallback)
-            elif isinstance(t, dict):
-                t_type = t.get("type")
-                if t_type == "function":
-                    standard_tools.append(Tool.model_validate(t))
-                elif t_type == "image_generation":
-                    image_tools.append(ResponseImageTool.model_validate(t))
-    image_instruction = _build_image_generation_instruction(
-        image_tools,
-        request_data.tool_choice
-        if isinstance(request_data.tool_choice, ResponseToolChoice)
-        else None,
-    )
-    if image_instruction:
-        extra_instructions.append(image_instruction)
-        logger.debug("Image generation support enabled for /v1/responses request.")
-    preface_messages = _instructions_to_messages(request_data.instructions)
-    conversation_messages = base_messages
-    if preface_messages:
-        conversation_messages = [*preface_messages, *base_messages]
-        logger.debug(
-            f"Injected {len(preface_messages)} instruction messages before sending to Gemini."
-        )
-    # Pass standard tools to the prompt builder
-    # Determine tool_choice for standard tools (ignore image_generation choice here as it is handled via instruction)
-    model_tool_choice = None
-    if isinstance(request_data.tool_choice, str):
-        model_tool_choice = request_data.tool_choice
-    elif isinstance(request_data.tool_choice, ToolChoiceFunction):
-        model_tool_choice = request_data.tool_choice
-    # If tool_choice is ResponseToolChoice (image_generation), we don't pass it as a function tool choice.
-    messages = _prepare_messages_for_model(
-        conversation_messages,
-        tools=standard_tools or None,
-        tool_choice=model_tool_choice,
-        extra_instructions=extra_instructions or None,
-    )
-    pool = GeminiClientPool()
-    db = LMDBConversationStore()
-    try:
-        model = _get_model_by_name(request_data.model)
-    except ValueError as exc:
-        raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=str(exc)) from exc
-    session, client, remaining_messages = await _find_reusable_session(db, pool, model, messages)
-    async def _build_payload(
-        _payload_messages: list[Message], _reuse_session: bool
-    ) -> tuple[str, list[Path | str]]:
-        if _reuse_session and len(_payload_messages) == 1:
-            return await GeminiClientWrapper.process_message(
-                _payload_messages[0], tmp_dir, tagged=False
-            )
-        return await GeminiClientWrapper.process_conversation(_payload_messages, tmp_dir)
-    reuse_session = session is not None
-    if reuse_session:
-        messages_to_send = _prepare_messages_for_model(
-            remaining_messages,
-            tools=request_data.tools,  # Keep for XML hint logic
-            tool_choice=request_data.tool_choice,
-            extra_instructions=None,  # Already in session history
-            inject_system_defaults=False,
-        )
-        if not messages_to_send:
-            raise HTTPException(
-                status_code=status.HTTP_400_BAD_REQUEST,
-                detail="No new messages to send for the existing session.",
-            )
-        payload_messages = messages_to_send
-        model_input, files = await _build_payload(payload_messages, _reuse_session=True)
-        logger.debug(
-            f"Reused session {session.metadata} - sending {len(payload_messages)} prepared messages."
-        )
-    else:
-        try:
-            client = await pool.acquire()
-            session = client.start_chat(model=model)
-            payload_messages = messages
-            model_input, files = await _build_payload(payload_messages, _reuse_session=False)
-        except ValueError as e:
-            raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=str(e))
-        except RuntimeError as e:
-            raise HTTPException(status_code=status.HTTP_503_SERVICE_UNAVAILABLE, detail=str(e))
-        except Exception as e:
-            logger.exception(f"Error in preparing conversation for responses API: {e}")
-            raise
-        logger.debug("New session started for /v1/responses request.")
-    try:
-        assert session and client, "Session and client not available"
-        client_id = client.id
-        logger.debug(
-            f"Client ID: {client_id}, Input length: {len(model_input)}, files count: {len(files)}"
-        )
-        model_output = await _send_with_split(session, model_input, files=files)
-    except APIError as exc:
-        client_id = client.id if client else "unknown"
-        logger.warning(f"Gemini API returned invalid response for client {client_id}: {exc}")
-        raise HTTPException(
-            status_code=status.HTTP_503_SERVICE_UNAVAILABLE,
-            detail="Gemini temporarily returned an invalid response. Please retry.",
-        ) from exc
-    except HTTPException:
-        raise
-    except Exception as e:
-        logger.exception(f"Unexpected error generating content from Gemini API for responses: {e}")
-        raise HTTPException(
-            status_code=status.HTTP_502_BAD_GATEWAY,
-            detail="Gemini returned an unexpected error.",
-        ) from e
-    try:
-        text_with_think = GeminiClientWrapper.extract_output(model_output, include_thoughts=True)
-        text_without_think = GeminiClientWrapper.extract_output(
-            model_output, include_thoughts=False
-        )
-    except IndexError as exc:
-        logger.exception("Gemini output parsing failed (IndexError).")
-        raise HTTPException(
-            status_code=status.HTTP_502_BAD_GATEWAY,
-            detail="Gemini returned malformed response content.",
-        ) from exc
-    except Exception as exc:
-        logger.exception("Gemini output parsing failed unexpectedly.")
-        raise HTTPException(
-            status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
-            detail="Gemini output parsing failed unexpectedly.",
-        ) from exc
-    visible_text, detected_tool_calls = extract_tool_calls(text_with_think)
-    storage_output = remove_tool_call_blocks(text_without_think).strip()
-    assistant_text = LMDBConversationStore.remove_think_tags(visible_text.strip())
-    if structured_requirement:
-        cleaned_visible = strip_code_fence(assistant_text or "")
-        if not cleaned_visible:
-            raise HTTPException(
-                status_code=status.HTTP_502_BAD_GATEWAY,
-                detail="LLM returned an empty response while JSON schema output was requested.",
-            )
-        try:
-            structured_payload = orjson.loads(cleaned_visible)
-        except orjson.JSONDecodeError as exc:
-            logger.warning(
-                f"Failed to decode JSON for structured response (schema={structured_requirement.schema_name}): "
-                f"{cleaned_visible}"
-            )
-            raise HTTPException(
-                status_code=status.HTTP_502_BAD_GATEWAY,
-                detail="LLM returned invalid JSON for the requested response_format.",
-            ) from exc
-        canonical_output = orjson.dumps(structured_payload).decode("utf-8")
-        assistant_text = canonical_output
-        storage_output = canonical_output
-        logger.debug(
-            f"Structured response fulfilled for /v1/responses (schema={structured_requirement.schema_name})."
-        )
-    expects_image = (
-        request_data.tool_choice is not None and request_data.tool_choice.type == "image_generation"
-    )
-    images = model_output.images or []
-    logger.debug(
-        f"Gemini returned {len(images)} image(s) for /v1/responses "
-        f"(expects_image={expects_image}, instruction_applied={bool(image_instruction)})."
-    )
-    if expects_image and not images:
-        summary = assistant_text.strip() if assistant_text else ""
-        if summary:
-            summary = re.sub(r"\s+", " ", summary)
-            if len(summary) > 200:
-                summary = f"{summary[:197]}..."
-        logger.warning(
-            "Image generation requested but Gemini produced no images. "
-            f"client_id={client_id}, forced_tool_choice={request_data.tool_choice is not None}, "
-            f"instruction_applied={bool(image_instruction)}, assistant_preview='{summary}'"
-        )
-        detail = "LLM returned no images for the requested image_generation tool."
-        if summary:
-            detail = f"{detail} Assistant response: {summary}"
-        raise HTTPException(status_code=status.HTTP_502_BAD_GATEWAY, detail=detail)
-    response_contents: list[ResponseOutputContent] = []
-    image_call_items: list[ResponseImageGenerationCall] = []
-    for image in images:
-        try:
-            image_base64, width, height, filename = await _image_to_base64(image, image_store)
-        except Exception as exc:
-            logger.warning(f"Failed to download generated image: {exc}")
-            continue
-        img_format = "png" if isinstance(image, GeneratedImage) else "jpeg"
-        # Use static URL for compatibility
-        image_url = (
-            f"![{filename}]({request.base_url}images/{filename}?token={get_image_token(filename)})"
-        )
-        image_call_items.append(
-            ResponseImageGenerationCall(
-                id=filename.rsplit(".", 1)[0],
-                status="completed",
-                result=image_base64,
-                output_format=img_format,
-                size=f"{width}x{height}" if width and height else None,
-            )
-        )
-        # Add as output_text content for compatibility
-        response_contents.append(
-            ResponseOutputContent(type="output_text", text=image_url, annotations=[])
-        )
-    tool_call_items: list[ResponseToolCall] = []
-    if detected_tool_calls:
-        tool_call_items = [
-            ResponseToolCall(
-                id=call.id,
-                status="completed",
-                function=call.function,
-            )
-            for call in detected_tool_calls
-        ]
-    if assistant_text:
-        response_contents.append(
-            ResponseOutputContent(type="output_text", text=assistant_text, annotations=[])
-        )
-    if not response_contents:
-        response_contents.append(ResponseOutputContent(type="output_text", text="", annotations=[]))
-    created_time = int(datetime.now(tz=timezone.utc).timestamp())
-    response_id = f"resp_{uuid.uuid4().hex}"
-    message_id = f"msg_{uuid.uuid4().hex}"
-    input_tokens = sum(estimate_tokens(text_from_message(msg)) for msg in messages)
-    tool_arg_text = "".join(call.function.arguments or "" for call in detected_tool_calls)
-    completion_basis = assistant_text or ""
-    if tool_arg_text:
-        completion_basis = (
-            f"{completion_basis}\n{tool_arg_text}" if completion_basis else tool_arg_text
-        )
-    output_tokens = estimate_tokens(completion_basis)
-    usage = ResponseUsage(
-        input_tokens=input_tokens,
-        output_tokens=output_tokens,
-        total_tokens=input_tokens + output_tokens,
-    )
-    response_payload = ResponseCreateResponse(
-        id=response_id,
-        created_at=created_time,
-        model=request_data.model,
-        output=[
-            ResponseOutputMessage(
-                id=message_id,
-                type="message",
-                role="assistant",
-                content=response_contents,
-            ),
-            *tool_call_items,
-            *image_call_items,
-        ],
-        status="completed",
-        usage=usage,
-        input=normalized_input or None,
-        metadata=request_data.metadata or None,
-        tools=request_data.tools,
-        tool_choice=request_data.tool_choice,
-    )
-    try:
-        current_assistant_message = Message(
-            role="assistant",
-            content=storage_output or None,
-            tool_calls=detected_tool_calls or None,
-        )
-        full_history = [*messages, current_assistant_message]
-        cleaned_history = db.sanitize_assistant_messages(full_history)
-        conv = ConversationInStore(
-            model=model.model_name,
-            client_id=client.id,
-            metadata=session.metadata,
-            messages=cleaned_history,
-        )
-        key = db.store(conv)
-        logger.debug(f"Conversation saved to LMDB with key: {key}")
-    except Exception as exc:
-        logger.warning(f"Failed to save Responses conversation to LMDB: {exc}")
-    if request_data.stream:
-        logger.debug(
-            f"Streaming Responses API payload (response_id={response_payload.id}, text_chunks={bool(assistant_text)})."
-        )
-        return _create_responses_streaming_response(response_payload, assistant_text or "")
-    return response_payload
-async def _find_reusable_session(
     db: LMDBConversationStore,
-    pool: GeminiClientPool,
     model: Model,
-    messages: list[Message],
-) -> tuple[ChatSession | None, GeminiClientWrapper | None, list[Message]]:
-    """Find an existing chat session that matches the *longest* prefix of
-    ``messages`` **whose last element is an assistant/system reply**.
-    Rationale
-    ---------
-    When a reply was generated by *another* server instance, the local LMDB may
-    only contain an older part of the conversation.  However, as long as we can
-    line up **any** earlier assistant/system response, we can restore the
-    corresponding Gemini session and replay the *remaining* turns locally
-    (including that missing assistant reply and the subsequent user prompts).
-    The algorithm therefore walks backwards through the history **one message at
-    a time**, each time requiring the current tail to be assistant/system before
-    querying LMDB.  As soon as a match is found we recreate the session and
-    return the untouched suffix as ``remaining_messages``.
-    """
-    if len(messages) < 2:
-        return None, None, messages
-    # Start with the full history and iteratively trim from the end.
-    search_end = len(messages)
-    while search_end >= 2:
-        search_history = messages[:search_end]
-        # Only try to match if the last stored message would be assistant/system/tool before querying LMDB.
-        if search_history[-1].role in {"assistant", "system", "tool"}:
-            try:
-                if conv := db.find(model.model_name, search_history):
-                    # Check if metadata is too old
-                    now = datetime.now()
-                    updated_at = conv.updated_at or conv.created_at or now
-                    age_minutes = (now - updated_at).total_seconds() / 60
-                    if age_minutes <= METADATA_TTL_MINUTES:
-                        client = await pool.acquire(conv.client_id)
-                        session = client.start_chat(metadata=conv.metadata, model=model)
-                        remain = messages[search_end:]
-                        logger.debug(
-                            f"Match found at prefix length {search_end}. Client: {conv.client_id}"
-                        )
-                        return session, client, remain
-                    else:
-                        logger.debug(
-                            f"Matched conversation is too old ({age_minutes:.1f}m), skipping reuse."
-                        )
-            except Exception as e:
-                logger.warning(
-                    f"Error checking LMDB for reusable session at length {search_end}: {e}"
-                )
-                break
-        # Trim one message and try again.
-        search_end -= 1
-    return None, None, messages
-async def _send_with_split(session: ChatSession, text: str, files: list[Path | str] | None = None):
     """
-    Send text to Gemini. If text is longer than ``MAX_CHARS_PER_REQUEST``,
-    it is converted into a temporary text file attachment to avoid splitting issues.
     """
-    if len(text) <= MAX_CHARS_PER_REQUEST:
-        try:
-            return await session.send_message(text, files=files)
-        except Exception as e:
-            logger.exception(f"Error sending message to Gemini: {e}")
-            raise
-    logger.info(
-        f"Message length ({len(text)}) exceeds limit ({MAX_CHARS_PER_REQUEST}). Converting text to file attachment."
-    )
-    # Create a temporary directory to hold the message.txt file
-    # This ensures the filename is exactly 'message.txt' as expected by the instruction.
-    with tempfile.TemporaryDirectory() as tmpdirname:
-        temp_file_path = Path(tmpdirname) / "message.txt"
-        temp_file_path.write_text(text, encoding="utf-8")
         try:
-            # Prepare the files list
-            final_files = list(files) if files else []
-            final_files.append(temp_file_path)
-            instruction = (
-                "The user's input exceeds the character limit and is provided in the attached file `message.txt`.\n\n"
-                "**System Instruction:**\n"
-                "1. Read the content of `message.txt`.\n"
-                "2. Treat that content as the **primary** user prompt for this turn.\n"
-                "3. Execute the instructions or answer the questions found *inside* that file immediately.\n"
-            )
-            logger.debug(f"Sending prompt as temporary file: {temp_file_path}")
-            return await session.send_message(instruction, files=final_files)
         except Exception as e:
-            logger.exception(f"Error sending large text as file to Gemini: {e}")
-            raise
-def _create_streaming_response(
-    model_output: str,
-    tool_calls: list[dict],
-    completion_id: str,
-    created_time: int,
-    model: str,
-    messages: list[Message],
-) -> StreamingResponse:
-    """Create streaming response with `usage` calculation included in the final chunk."""
-    # Calculate token usage
-    prompt_tokens = sum(estimate_tokens(text_from_message(msg)) for msg in messages)
-    tool_args = "".join(call.get("function", {}).get("arguments", "") for call in tool_calls or [])
-    completion_tokens = estimate_tokens(model_output + tool_args)
-    total_tokens = prompt_tokens + completion_tokens
-    finish_reason = "tool_calls" if tool_calls else "stop"
-    async def generate_stream():
-        # Send start event
-        data = {
-            "id": completion_id,
-            "object": "chat.completion.chunk",
-            "created": created_time,
-            "model": model,
-            "choices": [{"index": 0, "delta": {"role": "assistant"}, "finish_reason": None}],
-        }
-        yield f"data: {orjson.dumps(data).decode('utf-8')}\n\n"
-        # Stream output text in chunks for efficiency
-        for chunk in iter_stream_segments(model_output):
             data = {
                 "id": completion_id,
                 "object": "chat.completion.chunk",
                 "created": created_time,
-                "model": model,
-                "choices": [{"index": 0, "delta": {"content": chunk}, "finish_reason": None}],
             }
             yield f"data: {orjson.dumps(data).decode('utf-8')}\n\n"
-        if tool_calls:
-            tool_calls_delta = [{**call, "index": idx} for idx, call in enumerate(tool_calls)]
             data = {
                 "id": completion_id,
                 "object": "chat.completion.chunk",
                 "created": created_time,
-                "model": model,
                 "choices": [
-                    {
-                        "index": 0,
-                        "delta": {"tool_calls": tool_calls_delta},
-                        "finish_reason": None,
-                    }
                 ],
             }
             yield f"data: {orjson.dumps(data).decode('utf-8')}\n\n"
-        # Send end event
         data = {
             "id": completion_id,
             "object": "chat.completion.chunk",
             "created": created_time,
-            "model": model,
-            "choices": [{"index": 0, "delta": {}, "finish_reason": finish_reason}],
-            "usage": {
-                "prompt_tokens": prompt_tokens,
-                "completion_tokens": completion_tokens,
-                "total_tokens": total_tokens,
-            },
         }
         yield f"data: {orjson.dumps(data).decode('utf-8')}\n\n"
         yield "data: [DONE]\n\n"
     return StreamingResponse(generate_stream(), media_type="text/event-stream")
-def _create_responses_streaming_response(
-    response_payload: ResponseCreateResponse,
-    assistant_text: str | None,
 ) -> StreamingResponse:
-    """Create streaming response for Responses API using event types defined by OpenAI."""
-    response_dict = response_payload.model_dump(mode="json")
-    response_id = response_payload.id
-    created_time = response_payload.created_at
-    model = response_payload.model
-    logger.debug(
-        f"Preparing streaming envelope for /v1/responses (response_id={response_id}, model={model})."
-    )
     base_event = {
         "id": response_id,
         "object": "response",
         "created_at": created_time,
-        "model": model,
-    }
-    created_snapshot: dict[str, Any] = {
-        "id": response_id,
-        "object": "response",
-        "created_at": created_time,
-        "model": model,
-        "status": "in_progress",
     }
-    if response_dict.get("metadata") is not None:
-        created_snapshot["metadata"] = response_dict["metadata"]
-    if response_dict.get("input") is not None:
-        created_snapshot["input"] = response_dict["input"]
-    if response_dict.get("tools") is not None:
-        created_snapshot["tools"] = response_dict["tools"]
-    if response_dict.get("tool_choice") is not None:
-        created_snapshot["tool_choice"] = response_dict["tool_choice"]
     async def generate_stream():
-        # Emit creation event
-        data = {
-            **base_event,
-            "type": "response.created",
-            "response": created_snapshot,
-        }
-        yield f"data: {orjson.dumps(data).decode('utf-8')}\n\n"
-        # Stream output items (Message/Text, Tool Calls, Images)
-        for i, item in enumerate(response_payload.output):
-            item_json = item.model_dump(mode="json", exclude_none=True)
-            added_event = {
-                **base_event,
-                "type": "response.output_item.added",
-                "output_index": i,
-                "item": item_json,
-            }
-            yield f"data: {orjson.dumps(added_event).decode('utf-8')}\n\n"
-            # 2. Stream content if it's a message (text)
-            if item.type == "message":
-                content_text = ""
-                # Aggregate text content to stream
-                for c in item.content:
-                    if c.type == "output_text" and c.text:
-                        content_text += c.text
-                if content_text:
-                    for chunk in iter_stream_segments(content_text):
-                        delta_event = {
-                            **base_event,
-                            "type": "response.output_text.delta",
-                            "output_index": i,
-                            "delta": chunk,
-                        }
-                        yield f"data: {orjson.dumps(delta_event).decode('utf-8')}\n\n"
-                    # Text done
-                    done_event = {
-                        **base_event,
-                        "type": "response.output_text.done",
-                        "output_index": i,
-                    }
-                    yield f"data: {orjson.dumps(done_event).decode('utf-8')}\n\n"
-            # 3. Emit output_item.done for all types
-            # This confirms the item is fully transferred.
-            item_done_event = {
-                **base_event,
-                "type": "response.output_item.done",
-                "output_index": i,
-                "item": item_json,
-            }
-            yield f"data: {orjson.dumps(item_done_event).decode('utf-8')}\n\n"
-        # Emit completed event with full payload
-        completed_event = {
-            **base_event,
-            "type": "response.completed",
-            "response": response_dict,
-        }
-        yield f"data: {orjson.dumps(completed_event).decode('utf-8')}\n\n"
         yield "data: [DONE]\n\n"
     return StreamingResponse(generate_stream(), media_type="text/event-stream")
-def _create_standard_response(
-    model_output: str,
-    tool_calls: list[dict],
-    completion_id: str,
-    created_time: int,
-    model: str,
-    messages: list[Message],
-) -> dict:
-    """Create standard response"""
-    # Calculate token usage
-    prompt_tokens = sum(estimate_tokens(text_from_message(msg)) for msg in messages)
-    tool_args = "".join(call.get("function", {}).get("arguments", "") for call in tool_calls or [])
-    completion_tokens = estimate_tokens(model_output + tool_args)
-    total_tokens = prompt_tokens + completion_tokens
-    finish_reason = "tool_calls" if tool_calls else "stop"
-    message_payload: dict = {"role": "assistant", "content": model_output or None}
-    if tool_calls:
-        message_payload["tool_calls"] = tool_calls
-    result = {
-        "id": completion_id,
-        "object": "chat.completion",
-        "created": created_time,
-        "model": model,
-        "choices": [
-            {
-                "index": 0,
-                "message": message_payload,
-                "finish_reason": finish_reason,
-            }
-        ],
-        "usage": {
-            "prompt_tokens": prompt_tokens,
-            "completion_tokens": completion_tokens,
-            "total_tokens": total_tokens,
-        },
-    }
-    logger.debug(f"Response created with {total_tokens} total tokens")
-    return result
-async def _image_to_base64(image: Image, temp_dir: Path) -> tuple[str, int | None, int | None, str]:
-    """Persist an image provided by gemini_webapi and return base64 plus dimensions and filename."""
-    if isinstance(image, GeneratedImage):
         try:
-            saved_path = await image.save(path=str(temp_dir), full_size=True)
         except Exception as e:
-            logger.warning(
-                f"Failed to download full-size GeneratedImage, retrying with default size: {e}"
             )
-            saved_path = await image.save(path=str(temp_dir), full_size=False)
     else:
-        saved_path = await image.save(path=str(temp_dir))
-    if not saved_path:
-        raise ValueError("Failed to save generated image")
-    # Rename file to a random UUID to ensure uniqueness and unpredictability
-    original_path = Path(saved_path)
-    random_name = f"img_{uuid.uuid4().hex}{original_path.suffix}"
-    new_path = temp_dir / random_name
-    original_path.rename(new_path)
-    data = new_path.read_bytes()
-    width, height = extract_image_dimensions(data)
-    filename = random_name
-    return base64.b64encode(data).decode("ascii"), width, height, filename

 import base64
+import hashlib
+import io
+import reprlib
 import uuid
 from dataclasses import dataclass
 from datetime import datetime, timezone
 from pathlib import Path
+from typing import Any, AsyncGenerator
 import orjson
 from fastapi import APIRouter, Depends, HTTPException, Request, status
 from fastapi.responses import StreamingResponse
+from gemini_webapi import ModelOutput
 from gemini_webapi.client import ChatSession
 from gemini_webapi.constants import Model
 from gemini_webapi.types.image import GeneratedImage, Image
 from loguru import logger
 from ..utils.helper import (
     CODE_BLOCK_HINT,
     CODE_HINT_STRIPPED,
+    CONTROL_TOKEN_RE,
     XML_HINT_STRIPPED,
     XML_WRAP_HINT,
     estimate_tokens,
     extract_image_dimensions,
     extract_tool_calls,
     strip_code_fence,
     text_from_message,
 )
 from .middleware import get_image_store_dir, get_image_token, get_temp_dir, verify_api_key
 MAX_CHARS_PER_REQUEST = int(g_config.gemini.max_chars_per_request * 0.9)
 METADATA_TTL_MINUTES = 15
 router = APIRouter()
     raw_format: dict[str, Any]
+# --- Helper Functions ---
+async def _image_to_base64(
+    image: Image, temp_dir: Path
+) -> tuple[str, int | None, int | None, str, str]:
+    """Persist an image provided by gemini_webapi and return base64 plus dimensions, filename, and hash."""
+    if isinstance(image, GeneratedImage):
+        try:
+            saved_path = await image.save(path=str(temp_dir), full_size=True)
+        except Exception as e:
+            logger.warning(
+                f"Failed to download full-size GeneratedImage, retrying with default size: {e}"
+            )
+            saved_path = await image.save(path=str(temp_dir), full_size=False)
+    else:
+        saved_path = await image.save(path=str(temp_dir))
+    if not saved_path:
+        raise ValueError("Failed to save generated image")
+    original_path = Path(saved_path)
+    random_name = f"img_{uuid.uuid4().hex}{original_path.suffix}"
+    new_path = temp_dir / random_name
+    original_path.rename(new_path)
+    data = new_path.read_bytes()
+    width, height = extract_image_dimensions(data)
+    filename = random_name
+    file_hash = hashlib.sha256(data).hexdigest()
+    return base64.b64encode(data).decode("ascii"), width, height, filename, file_hash
+def _calculate_usage(
+    messages: list[Message],
+    assistant_text: str | None,
+    tool_calls: list[Any] | None,
+) -> tuple[int, int, int]:
+    """Calculate prompt, completion and total tokens consistently."""
+    prompt_tokens = sum(estimate_tokens(text_from_message(msg)) for msg in messages)
+    tool_args_text = ""
+    if tool_calls:
+        for call in tool_calls:
+            if hasattr(call, "function"):
+                tool_args_text += call.function.arguments or ""
+            elif isinstance(call, dict):
+                tool_args_text += call.get("function", {}).get("arguments", "")
+    completion_basis = assistant_text or ""
+    if tool_args_text:
+        completion_basis = (
+            f"{completion_basis}\n{tool_args_text}" if completion_basis else tool_args_text
+        )
+    completion_tokens = estimate_tokens(completion_basis)
+    return prompt_tokens, completion_tokens, prompt_tokens + completion_tokens
+def _create_responses_standard_payload(
+    response_id: str,
+    created_time: int,
+    model_name: str,
+    detected_tool_calls: list[Any] | None,
+    image_call_items: list[ResponseImageGenerationCall],
+    response_contents: list[ResponseOutputContent],
+    usage: ResponseUsage,
+    request: ResponseCreateRequest,
+    normalized_input: Any,
+) -> ResponseCreateResponse:
+    """Unified factory for building ResponseCreateResponse objects."""
+    message_id = f"msg_{uuid.uuid4().hex}"
+    tool_call_items: list[ResponseToolCall] = []
+    if detected_tool_calls:
+        tool_call_items = [
+            ResponseToolCall(
+                id=call.id if hasattr(call, "id") else call["id"],
+                status="completed",
+                function=call.function if hasattr(call, "function") else call["function"],
+            )
+            for call in detected_tool_calls
+        ]
+    return ResponseCreateResponse(
+        id=response_id,
+        created_at=created_time,
+        model=model_name,
+        output=[
+            ResponseOutputMessage(
+                id=message_id,
+                type="message",
+                role="assistant",
+                content=response_contents,
+            ),
+            *tool_call_items,
+            *image_call_items,
+        ],
+        status="completed",
+        usage=usage,
+        input=normalized_input or None,
+        metadata=request.metadata or None,
+        tools=request.tools,
+        tool_choice=request.tool_choice,
+    )
+def _create_chat_completion_standard_payload(
+    completion_id: str,
+    created_time: int,
+    model_name: str,
+    visible_output: str | None,
+    tool_calls_payload: list[dict] | None,
+    finish_reason: str,
+    usage: dict,
+) -> dict:
+    """Unified factory for building Chat Completion response dictionaries."""
+    return {
+        "id": completion_id,
+        "object": "chat.completion",
+        "created": created_time,
+        "model": model_name,
+        "choices": [
+            {
+                "index": 0,
+                "message": {
+                    "role": "assistant",
+                    "content": visible_output or None,
+                    "tool_calls": tool_calls_payload or None,
+                },
+                "finish_reason": finish_reason,
+            }
+        ],
+        "usage": usage,
+    }
+def _process_llm_output(
+    raw_output_with_think: str,
+    raw_output_clean: str,
+    structured_requirement: StructuredOutputRequirement | None,
+) -> tuple[str, str, list[Any]]:
+    """
+    Common post-processing logic for Gemini output.
+    Returns: (visible_text, storage_output, tool_calls)
+    """
+    visible_with_think, tool_calls = extract_tool_calls(raw_output_with_think)
+    if tool_calls:
+        logger.debug(f"Detected {len(tool_calls)} tool call(s) in model output.")
+    visible_output = visible_with_think.strip()
+    storage_output, _ = extract_tool_calls(raw_output_clean)
+    storage_output = storage_output.strip()
+    if structured_requirement:
+        cleaned_for_json = LMDBConversationStore.remove_think_tags(visible_output)
+        json_text = strip_code_fence(cleaned_for_json or "")
+        if json_text:
+            try:
+                structured_payload = orjson.loads(json_text)
+                canonical_output = orjson.dumps(structured_payload).decode("utf-8")
+                visible_output = canonical_output
+                storage_output = canonical_output
+                logger.debug(
+                    f"Structured response fulfilled (schema={structured_requirement.schema_name})."
+                )
+            except orjson.JSONDecodeError:
+                logger.warning(
+                    f"Failed to decode JSON for structured response (schema={structured_requirement.schema_name})."
+                )
+    return visible_output, storage_output, tool_calls
+def _persist_conversation(
+    db: LMDBConversationStore,
+    model_name: str,
+    client_id: str,
+    metadata: list[str | None],
+    messages: list[Message],
+    storage_output: str | None,
+    tool_calls: list[Any] | None,
+) -> str | None:
+    """Unified logic to save conversation history to LMDB."""
+    try:
+        current_assistant_message = Message(
+            role="assistant",
+            content=storage_output or None,
+            tool_calls=tool_calls or None,
+        )
+        full_history = [*messages, current_assistant_message]
+        cleaned_history = db.sanitize_assistant_messages(full_history)
+        conv = ConversationInStore(
+            model=model_name,
+            client_id=client_id,
+            metadata=metadata,
+            messages=cleaned_history,
+        )
+        key = db.store(conv)
+        logger.debug(f"Conversation saved to LMDB with key: {key[:12]}")
+        return key
+    except Exception as e:
+        logger.warning(f"Failed to save {len(messages) + 1} messages to LMDB: {e}")
+        return None
 def _build_structured_requirement(
     response_format: dict[str, Any] | None,
 ) -> StructuredOutputRequirement | None:
         return None
     if response_format.get("type") != "json_schema":
+        logger.warning(
+            f"Unsupported response_format type requested: {reprlib.repr(response_format)}"
+        )
         return None
     json_schema = response_format.get("json_schema")
     if not isinstance(json_schema, dict):
+        logger.warning(
+            f"Invalid json_schema payload in response_format: {reprlib.repr(response_format)}"
+        )
         return None
     schema = json_schema.get("schema")
     if not isinstance(schema, dict):
+        logger.warning(
+            f"Missing `schema` object in response_format payload: {reprlib.repr(response_format)}"
+        )
         return None
     schema_name = json_schema.get("name") or "response"
         description = function.description or "No description provided."
         lines.append(f"Tool `{function.name}`: {description}")
         if function.parameters:
+            schema_text = orjson.dumps(function.parameters, option=orjson.OPT_SORT_KEYS).decode(
+                "utf-8"
+            )
             lines.append("Arguments JSON schema:")
             lines.append(schema_text)
         else:
         lines.append(
             f"You are required to call the tool named `{target}`. Do not call any other tool."
         )
     lines.append(
         "When you decide to call a tool you MUST respond with nothing except a single fenced block exactly like the template below."
         if isinstance(msg.content, str):
             if XML_HINT_STRIPPED not in msg.content:
+                msg.content = f"{msg.content}\n{XML_WRAP_HINT}"
             return
         if isinstance(msg.content, list):
                 text_value = part.text or ""
                 if XML_HINT_STRIPPED in text_value:
                     return
+                part.text = f"{text_value}\n{XML_WRAP_HINT}"
                 return
             messages_text = XML_WRAP_HINT.strip()
             msg.content.append(ContentItem(type="text", text=messages_text))
             return
 def _conversation_has_code_hint(messages: list[Message]) -> bool:
     """Return True if any system message already includes the code block hint."""
     """Return a copy of messages enriched with tool instructions when needed."""
     prepared = [msg.model_copy(deep=True) for msg in source_messages]
+    # Resolve tool names for 'tool' messages by looking back at previous assistant tool calls
+    tool_id_to_name = {}
+    for msg in prepared:
+        if msg.role == "assistant" and msg.tool_calls:
+            for tc in msg.tool_calls:
+                tool_id_to_name[tc.id] = tc.function.name
+    for msg in prepared:
+        if msg.role == "tool" and not msg.name and msg.tool_call_id:
+            msg.name = tool_id_to_name.get(msg.tool_call_id)
     instructions: list[str] = []
     if inject_system_defaults:
         if tools:
             logger.debug("Injected default code block hint for Gemini conversation.")
     if not instructions:
         if tools and tool_choice != "none":
             _append_xml_hint_to_last_user_message(prepared)
         return prepared
     normalized_input: list[ResponseInputItem] = []
     for item in items:
         role = item.role
         content = item.content
         normalized_contents: list[ResponseInputContent] = []
         if isinstance(content, str):
             continue
         role = item.role
         content = item.content
         if isinstance(content, str):
             instruction_messages.append(Message(role=role, content=content))
 def _get_model_by_name(name: str) -> Model:
+    """Retrieve a Model instance by name."""
     strategy = g_config.gemini.model_strategy
     custom_models = {m.model_name: m for m in g_config.gemini.models if m.model_name}
 def _get_available_models() -> list[ModelData]:
+    """Return a list of available models based on configuration strategy."""
     now = int(datetime.now(tz=timezone.utc).timestamp())
     strategy = g_config.gemini.model_strategy
     models_data = []
     return models_data
+async def _find_reusable_session(
+    db: LMDBConversationStore,
+    pool: GeminiClientPool,
+    model: Model,
+    messages: list[Message],
+) -> tuple[ChatSession | None, GeminiClientWrapper | None, list[Message]]:
+    """Find an existing chat session matching the longest suitable history prefix."""
+    if len(messages) < 2:
+        return None, None, messages
+    search_end = len(messages)
+    while search_end >= 2:
+        search_history = messages[:search_end]
+        if search_history[-1].role in {"assistant", "system", "tool"}:
+            try:
+                if conv := db.find(model.model_name, search_history):
+                    now = datetime.now()
+                    updated_at = conv.updated_at or conv.created_at or now
+                    age_minutes = (now - updated_at).total_seconds() / 60
+                    if age_minutes <= METADATA_TTL_MINUTES:
+                        client = await pool.acquire(conv.client_id)
+                        session = client.start_chat(metadata=conv.metadata, model=model)
+                        remain = messages[search_end:]
+                        logger.debug(
+                            f"Match found at prefix length {search_end}/{len(messages)}. Client: {conv.client_id}"
+                        )
+                        return session, client, remain
+                    else:
+                        logger.debug(
+                            f"Matched conversation at length {search_end} is too old ({age_minutes:.1f}m), skipping reuse."
+                        )
+                else:
+                    # Log that we tried this prefix but failed
+                    pass
+            except Exception as e:
+                logger.warning(
+                    f"Error checking LMDB for reusable session at length {search_end}: {e}"
+                )
+                break
+        search_end -= 1
+    logger.debug(f"No reusable session found for {len(messages)} messages.")
+    return None, None, messages
+async def _send_with_split(
+    session: ChatSession,
+    text: str,
+    files: list[Path | str | io.BytesIO] | None = None,
+    stream: bool = False,
+) -> AsyncGenerator[ModelOutput, None] | ModelOutput:
+    """Send text to Gemini, splitting or converting to attachment if too long."""
+    if len(text) <= MAX_CHARS_PER_REQUEST:
         try:
+            if stream:
+                return session.send_message_stream(text, files=files)
+            return await session.send_message(text, files=files)
         except Exception as e:
+            logger.exception(f"Error sending message to Gemini: {e}")
             raise
+    logger.info(
+        f"Message length ({len(text)}) exceeds limit ({MAX_CHARS_PER_REQUEST}). Converting text to file attachment."
+    )
+    file_obj = io.BytesIO(text.encode("utf-8"))
+    file_obj.name = "message.txt"
     try:
+        final_files = list(files) if files else []
+        final_files.append(file_obj)
+        instruction = (
+            "The user's input exceeds the character limit and is provided in the attached file `message.txt`.\n\n"
+            "**System Instruction:**\n"
+            "1. Read the content of `message.txt`.\n"
+            "2. Treat that content as the **primary** user prompt for this turn.\n"
+            "3. Execute the instructions or answer the questions found *inside* that file immediately.\n"
         )
+        if stream:
+            return session.send_message_stream(instruction, files=final_files)
+        return await session.send_message(instruction, files=final_files)
     except Exception as e:
+        logger.exception(f"Error sending large text as file to Gemini: {e}")
+        raise
+class StreamingOutputFilter:
+    """
+    Enhanced streaming filter that suppresses:
+    1. XML tool call blocks: ```xml ... ```
+    2. ChatML tool blocks: <|im_start|>tool\n...<|im_end|>
+    3. ChatML role headers: <|im_start|>role\n (only suppresses the header, keeps content)
+    4. Control tokens: <|im_start|>, <|im_end|>
+    5. System instructions/hints: XML_WRAP_HINT, CODE_BLOCK_HINT, etc.
+    """
+    def __init__(self):
+        self.buffer = ""
+        self.in_xml_tool = False
+        self.in_tagged_block = False
+        self.in_role_header = False
+        self.current_role = ""
+        self.XML_START = "```xml"
+        self.XML_END = "```"
+        self.TAG_START = "<|im_start|>"
+        self.TAG_END = "<|im_end|>"
+        self.SYSTEM_HINTS = [
+            XML_WRAP_HINT,
+            XML_HINT_STRIPPED,
+            CODE_BLOCK_HINT,
+            CODE_HINT_STRIPPED,
+        ]
+    def process(self, chunk: str) -> str:
+        self.buffer += chunk
+        to_yield = ""
+        while self.buffer:
+            if self.in_xml_tool:
+                end_idx = self.buffer.find(self.XML_END)
+                if end_idx != -1:
+                    self.buffer = self.buffer[end_idx + len(self.XML_END) :]
+                    self.in_xml_tool = False
+                else:
+                    break
+            elif self.in_role_header:
+                nl_idx = self.buffer.find("\n")
+                if nl_idx != -1:
+                    role_text = self.buffer[:nl_idx].strip().lower()
+                    self.current_role = role_text
+                    self.buffer = self.buffer[nl_idx + 1 :]
+                    self.in_role_header = False
+                    self.in_tagged_block = True
+                else:
+                    break
+            elif self.in_tagged_block:
+                end_idx = self.buffer.find(self.TAG_END)
+                if end_idx != -1:
+                    content = self.buffer[:end_idx]
+                    if self.current_role != "tool":
+                        to_yield += content
+                    self.buffer = self.buffer[end_idx + len(self.TAG_END) :]
+                    self.in_tagged_block = False
+                    self.current_role = ""
+                else:
+                    if self.current_role == "tool":
+                        break
+                    else:
+                        yield_len = len(self.buffer) - (len(self.TAG_END) - 1)
+                        if yield_len > 0:
+                            to_yield += self.buffer[:yield_len]
+                            self.buffer = self.buffer[yield_len:]
+                        break
+            else:
+                # Outside any special block. Look for starts.
+                earliest_idx = -1
+                match_type = ""
+                xml_idx = self.buffer.find(self.XML_START)
+                if xml_idx != -1:
+                    earliest_idx = xml_idx
+                    match_type = "xml"
+                tag_s_idx = self.buffer.find(self.TAG_START)
+                if tag_s_idx != -1:
+                    if earliest_idx == -1 or tag_s_idx < earliest_idx:
+                        earliest_idx = tag_s_idx
+                        match_type = "tag_start"
+                tag_e_idx = self.buffer.find(self.TAG_END)
+                if tag_e_idx != -1:
+                    if earliest_idx == -1 or tag_e_idx < earliest_idx:
+                        earliest_idx = tag_e_idx
+                        match_type = "tag_end"
+                if earliest_idx != -1:
+                    # Yield text before the match
+                    to_yield += self.buffer[:earliest_idx]
+                    self.buffer = self.buffer[earliest_idx:]
+                    if match_type == "xml":
+                        self.in_xml_tool = True
+                        self.buffer = self.buffer[len(self.XML_START) :]
+                    elif match_type == "tag_start":
+                        self.in_role_header = True
+                        self.buffer = self.buffer[len(self.TAG_START) :]
+                    elif match_type == "tag_end":
+                        # Orphaned end tag, just skip it
+                        self.buffer = self.buffer[len(self.TAG_END) :]
+                    continue
+                else:
+                    # Check for prefixes
+                    prefixes = [self.XML_START, self.TAG_START, self.TAG_END]
+                    max_keep = 0
+                    for p in prefixes:
+                        for i in range(len(p) - 1, 0, -1):
+                            if self.buffer.endswith(p[:i]):
+                                max_keep = max(max_keep, i)
+                                break
+                    yield_len = len(self.buffer) - max_keep
+                    if yield_len > 0:
+                        to_yield += self.buffer[:yield_len]
+                        self.buffer = self.buffer[yield_len:]
+                    break
+        # Final pass: filter out system hints from the text to be yielded
+        for hint in self.SYSTEM_HINTS:
+            if hint in to_yield:
+                to_yield = to_yield.replace(hint, "")
+        return to_yield
+    def flush(self) -> str:
+        # If we are stuck in a tool block or role header at the end,
+        # it usually means malformed output.
+        if self.in_xml_tool or (self.in_tagged_block and self.current_role == "tool"):
+            return ""
+        final_text = self.buffer
+        self.buffer = ""
+        # Filter out any orphaned/partial control tokens or hints
+        final_text = CONTROL_TOKEN_RE.sub("", final_text)
+        for hint in self.SYSTEM_HINTS:
+            final_text = final_text.replace(hint, "")
+        return final_text.strip()
+# --- Response Builders & Streaming ---
+def _create_real_streaming_response(
+    generator: AsyncGenerator[ModelOutput, None],
+    completion_id: str,
+    created_time: int,
+    model_name: str,
+    messages: list[Message],
     db: LMDBConversationStore,
     model: Model,
+    client_wrapper: GeminiClientWrapper,
+    session: ChatSession,
+    base_url: str,
+    structured_requirement: StructuredOutputRequirement | None = None,
+) -> StreamingResponse:
     """
+    Create a real-time streaming response.
+    Reconciles manual delta accumulation with the model's final authoritative state.
     """
+    async def generate_stream():
+        full_thoughts, full_text = "", ""
+        has_started = False
+        last_chunk_was_thought = False
+        all_outputs: list[ModelOutput] = []
+        suppressor = StreamingOutputFilter()
         try:
+            async for chunk in generator:
+                all_outputs.append(chunk)
+                if not has_started:
+                    data = {
+                        "id": completion_id,
+                        "object": "chat.completion.chunk",
+                        "created": created_time,
+                        "model": model_name,
+                        "choices": [
+                            {"index": 0, "delta": {"role": "assistant"}, "finish_reason": None}
+                        ],
+                    }
+                    yield f"data: {orjson.dumps(data).decode('utf-8')}\n\n"
+                    has_started = True
+                if t_delta := chunk.thoughts_delta:
+                    if not last_chunk_was_thought and not full_thoughts:
+                        yield f"data: {orjson.dumps({'id': completion_id, 'object': 'chat.completion.chunk', 'created': created_time, 'model': model_name, 'choices': [{'index': 0, 'delta': {'content': '<think>'}, 'finish_reason': None}]}).decode('utf-8')}\n\n"
+                    full_thoughts += t_delta
+                    data = {
+                        "id": completion_id,
+                        "object": "chat.completion.chunk",
+                        "created": created_time,
+                        "model": model_name,
+                        "choices": [
+                            {"index": 0, "delta": {"content": t_delta}, "finish_reason": None}
+                        ],
+                    }
+                    yield f"data: {orjson.dumps(data).decode('utf-8')}\n\n"
+                    last_chunk_was_thought = True
+                if text_delta := chunk.text_delta:
+                    if last_chunk_was_thought:
+                        yield f"data: {orjson.dumps({'id': completion_id, 'object': 'chat.completion.chunk', 'created': created_time, 'model': model_name, 'choices': [{'index': 0, 'delta': {'content': '</think>\n'}, 'finish_reason': None}]}).decode('utf-8')}\n\n"
+                        last_chunk_was_thought = False
+                    full_text += text_delta
+                    if visible_delta := suppressor.process(text_delta):
+                        data = {
+                            "id": completion_id,
+                            "object": "chat.completion.chunk",
+                            "created": created_time,
+                            "model": model_name,
+                            "choices": [
+                                {
+                                    "index": 0,
+                                    "delta": {"content": visible_delta},
+                                    "finish_reason": None,
+                                }
+                            ],
+                        }
+                        yield f"data: {orjson.dumps(data).decode('utf-8')}\n\n"
         except Exception as e:
+            logger.exception(f"Error during OpenAI streaming: {e}")
+            yield f"data: {orjson.dumps({'error': {'message': 'Streaming error occurred.', 'type': 'server_error', 'param': None, 'code': None}}).decode('utf-8')}\n\n"
+            return
+        if all_outputs:
+            final_chunk = all_outputs[-1]
+            if final_chunk.text:
+                full_text = final_chunk.text
+            if final_chunk.thoughts:
+                full_thoughts = final_chunk.thoughts
+        if last_chunk_was_thought:
+            yield f"data: {orjson.dumps({'id': completion_id, 'object': 'chat.completion.chunk', 'created': created_time, 'model': model_name, 'choices': [{'index': 0, 'delta': {'content': '</think>\n'}, 'finish_reason': None}]}).decode('utf-8')}\n\n"
+        if remaining_text := suppressor.flush():
+            data = {
+                "id": completion_id,
+                "object": "chat.completion.chunk",
+                "created": created_time,
+                "model": model_name,
+                "choices": [
+                    {"index": 0, "delta": {"content": remaining_text}, "finish_reason": None}
+                ],
+            }
+            yield f"data: {orjson.dumps(data).decode('utf-8')}\n\n"
+        raw_output_with_think = f"<think>{full_thoughts}</think>\n" if full_thoughts else ""
+        raw_output_with_think += full_text
+        assistant_text, storage_output, tool_calls = _process_llm_output(
+            raw_output_with_think, full_text, structured_requirement
+        )
+        images = []
+        seen_urls = set()
+        for out in all_outputs:
+            if out.images:
+                for img in out.images:
+                    # Use the image URL as a stable identifier across chunks
+                    if img.url not in seen_urls:
+                        images.append(img)
+                        seen_urls.add(img.url)
+        image_markdown = ""
+        seen_hashes = set()
+        for image in images:
+            try:
+                image_store = get_image_store_dir()
+                _, _, _, filename, file_hash = await _image_to_base64(image, image_store)
+                if file_hash in seen_hashes:
+                    # Duplicate content, delete the file and skip
+                    (image_store / filename).unlink(missing_ok=True)
+                    continue
+                seen_hashes.add(file_hash)
+                img_url = (
+                    f"![{filename}]({base_url}images/{filename}?token={get_image_token(filename)})"
+                )
+                image_markdown += f"\n\n{img_url}"
+            except Exception as exc:
+                logger.warning(f"Failed to process image in OpenAI stream: {exc}")
+        if image_markdown:
+            assistant_text += image_markdown
+            storage_output += image_markdown
+            # Send the image Markdown as a final text chunk before usage
             data = {
                 "id": completion_id,
                 "object": "chat.completion.chunk",
                 "created": created_time,
+                "model": model_name,
+                "choices": [
+                    {"index": 0, "delta": {"content": image_markdown}, "finish_reason": None}
+                ],
             }
             yield f"data: {orjson.dumps(data).decode('utf-8')}\n\n"
+        tool_calls_payload = [call.model_dump(mode="json") for call in tool_calls]
+        if tool_calls_payload:
+            tool_calls_delta = [
+                {**call, "index": idx} for idx, call in enumerate(tool_calls_payload)
+            ]
             data = {
                 "id": completion_id,
                 "object": "chat.completion.chunk",
                 "created": created_time,
+                "model": model_name,
                 "choices": [
+                    {"index": 0, "delta": {"tool_calls": tool_calls_delta}, "finish_reason": None}
                 ],
             }
             yield f"data: {orjson.dumps(data).decode('utf-8')}\n\n"
+        p_tok, c_tok, t_tok = _calculate_usage(messages, assistant_text, tool_calls)
+        usage = {"prompt_tokens": p_tok, "completion_tokens": c_tok, "total_tokens": t_tok}
         data = {
             "id": completion_id,
             "object": "chat.completion.chunk",
             "created": created_time,
+            "model": model_name,
+            "choices": [
+                {"index": 0, "delta": {}, "finish_reason": "tool_calls" if tool_calls else "stop"}
+            ],
+            "usage": usage,
         }
+        _persist_conversation(
+            db,
+            model.model_name,
+            client_wrapper.id,
+            session.metadata,
+            messages,  # This should be the prepared messages
+            storage_output,
+            tool_calls,
+        )
         yield f"data: {orjson.dumps(data).decode('utf-8')}\n\n"
         yield "data: [DONE]\n\n"
     return StreamingResponse(generate_stream(), media_type="text/event-stream")
+def _create_responses_real_streaming_response(
+    generator: AsyncGenerator[ModelOutput, None],
+    response_id: str,
+    created_time: int,
+    model_name: str,
+    messages: list[Message],
+    db: LMDBConversationStore,
+    model: Model,
+    client_wrapper: GeminiClientWrapper,
+    session: ChatSession,
+    request: ResponseCreateRequest,
+    image_store: Path,
+    base_url: str,
+    structured_requirement: StructuredOutputRequirement | None = None,
 ) -> StreamingResponse:
+    """
+    Create a real-time streaming response for the Responses API.
+    Ensures final accumulated text and thoughts are synchronized.
+    """
     base_event = {
         "id": response_id,
         "object": "response",
         "created_at": created_time,
+        "model": model_name,
     }
     async def generate_stream():
+        yield f"data: {orjson.dumps({**base_event, 'type': 'response.created', 'response': {'id': response_id, 'object': 'response', 'created_at': created_time, 'model': model_name, 'status': 'in_progress', 'metadata': request.metadata, 'input': None, 'tools': request.tools, 'tool_choice': request.tool_choice}}).decode('utf-8')}\n\n"
+        message_id = f"msg_{uuid.uuid4().hex}"
+        yield f"data: {orjson.dumps({**base_event, 'type': 'response.output_item.added', 'output_index': 0, 'item': {'id': message_id, 'type': 'message', 'role': 'assistant', 'content': []}}).decode('utf-8')}\n\n"
+        full_thoughts, full_text = "", ""
+        last_chunk_was_thought = False
+        all_outputs: list[ModelOutput] = []
+        suppressor = StreamingOutputFilter()
+        try:
+            async for chunk in generator:
+                all_outputs.append(chunk)
+                if t_delta := chunk.thoughts_delta:
+                    if not last_chunk_was_thought and not full_thoughts:
+                        yield f"data: {orjson.dumps({**base_event, 'type': 'response.output_text.delta', 'output_index': 0, 'delta': '<think>'}).decode('utf-8')}\n\n"
+                    full_thoughts += t_delta
+                    yield f"data: {orjson.dumps({**base_event, 'type': 'response.output_text.delta', 'output_index': 0, 'delta': t_delta}).decode('utf-8')}\n\n"
+                    last_chunk_was_thought = True
+                if text_delta := chunk.text_delta:
+                    if last_chunk_was_thought:
+                        yield f"data: {orjson.dumps({**base_event, 'type': 'response.output_text.delta', 'output_index': 0, 'delta': '</think>\n'}).decode('utf-8')}\n\n"
+                        last_chunk_was_thought = False
+                    full_text += text_delta
+                    if visible_delta := suppressor.process(text_delta):
+                        yield f"data: {orjson.dumps({**base_event, 'type': 'response.output_text.delta', 'output_index': 0, 'delta': visible_delta}).decode('utf-8')}\n\n"
+        except Exception as e:
+            logger.exception(f"Error during Responses API streaming: {e}")
+            yield f"data: {orjson.dumps({**base_event, 'type': 'error', 'error': {'message': 'Streaming error.'}}).decode('utf-8')}\n\n"
+            return
+        if all_outputs:
+            final_chunk = all_outputs[-1]
+            if final_chunk.text:
+                full_text = final_chunk.text
+            if final_chunk.thoughts:
+                full_thoughts = final_chunk.thoughts
+        if last_chunk_was_thought:
+            yield f"data: {orjson.dumps({**base_event, 'type': 'response.output_text.delta', 'output_index': 0, 'delta': '</think>\n'}).decode('utf-8')}\n\n"
+        if remaining_text := suppressor.flush():
+            yield f"data: {orjson.dumps({**base_event, 'type': 'response.output_text.delta', 'output_index': 0, 'delta': remaining_text}).decode('utf-8')}\n\n"
+        yield f"data: {orjson.dumps({**base_event, 'type': 'response.output_text.done', 'output_index': 0}).decode('utf-8')}\n\n"
+        raw_output_with_think = f"<think>{full_thoughts}</think>\n" if full_thoughts else ""
+        raw_output_with_think += full_text
+        assistant_text, storage_output, detected_tool_calls = _process_llm_output(
+            raw_output_with_think, full_text, structured_requirement
+        )
+        images = []
+        seen_urls = set()
+        for out in all_outputs:
+            if out.images:
+                for img in out.images:
+                    if img.url not in seen_urls:
+                        images.append(img)
+                        seen_urls.add(img.url)
+        response_contents, image_call_items = [], []
+        seen_hashes = set()
+        for image in images:
+            try:
+                image_base64, width, height, filename, file_hash = await _image_to_base64(
+                    image, image_store
+                )
+                if file_hash in seen_hashes:
+                    (image_store / filename).unlink(missing_ok=True)
+                    continue
+                seen_hashes.add(file_hash)
+                img_format = "png" if isinstance(image, GeneratedImage) else "jpeg"
+                image_url = (
+                    f"![{filename}]({base_url}images/{filename}?token={get_image_token(filename)})"
+                )
+                image_call_items.append(
+                    ResponseImageGenerationCall(
+                        id=filename.rsplit(".", 1)[0],
+                        result=image_base64,
+                        output_format=img_format,
+                        size=f"{width}x{height}" if width and height else None,
+                    )
+                )
+                response_contents.append(ResponseOutputContent(type="output_text", text=image_url))
+            except Exception as exc:
+                logger.warning(f"Failed to process image in stream: {exc}")
+        if assistant_text:
+            response_contents.append(ResponseOutputContent(type="output_text", text=assistant_text))
+        if not response_contents:
+            response_contents.append(ResponseOutputContent(type="output_text", text=""))
+        # Aggregate images for storage
+        image_markdown = ""
+        for img_call in image_call_items:
+            fname = f"{img_call.id}.{img_call.output_format}"
+            img_url = f"![{fname}]({base_url}images/{fname}?token={get_image_token(fname)})"
+            image_markdown += f"\n\n{img_url}"
+        if image_markdown:
+            storage_output += image_markdown
+        yield f"data: {orjson.dumps({**base_event, 'type': 'response.output_item.done', 'output_index': 0, 'item': {'id': message_id, 'type': 'message', 'role': 'assistant', 'content': [c.model_dump(mode='json') for c in response_contents]}}).decode('utf-8')}\n\n"
+        current_idx = 1
+        for call in detected_tool_calls:
+            tc_item = ResponseToolCall(id=call.id, status="completed", function=call.function)
+            yield f"data: {orjson.dumps({**base_event, 'type': 'response.output_item.added', 'output_index': current_idx, 'item': tc_item.model_dump(mode='json')}).decode('utf-8')}\n\n"
+            yield f"data: {orjson.dumps({**base_event, 'type': 'response.output_item.done', 'output_index': current_idx, 'item': tc_item.model_dump(mode='json')}).decode('utf-8')}\n\n"
+            current_idx += 1
+        for img_call in image_call_items:
+            yield f"data: {orjson.dumps({**base_event, 'type': 'response.output_item.added', 'output_index': current_idx, 'item': img_call.model_dump(mode='json')}).decode('utf-8')}\n\n"
+            yield f"data: {orjson.dumps({**base_event, 'type': 'response.output_item.done', 'output_index': current_idx, 'item': img_call.model_dump(mode='json')}).decode('utf-8')}\n\n"
+            current_idx += 1
+        p_tok, c_tok, t_tok = _calculate_usage(messages, assistant_text, detected_tool_calls)
+        usage = ResponseUsage(input_tokens=p_tok, output_tokens=c_tok, total_tokens=t_tok)
+        payload = _create_responses_standard_payload(
+            response_id,
+            created_time,
+            model_name,
+            detected_tool_calls,
+            image_call_items,
+            response_contents,
+            usage,
+            request,
+            None,
+        )
+        _persist_conversation(
+            db,
+            model.model_name,
+            client_wrapper.id,
+            session.metadata,
+            messages,
+            storage_output,
+            detected_tool_calls,
+        )
+        yield f"data: {orjson.dumps({**base_event, 'type': 'response.completed', 'response': payload.model_dump(mode='json')}).decode('utf-8')}\n\n"
         yield "data: [DONE]\n\n"
     return StreamingResponse(generate_stream(), media_type="text/event-stream")
+# --- Main Router Endpoints ---
+@router.get("/v1/models", response_model=ModelListResponse)
+async def list_models(api_key: str = Depends(verify_api_key)):
+    models = _get_available_models()
+    return ModelListResponse(data=models)
+@router.post("/v1/chat/completions")
+async def create_chat_completion(
+    request: ChatCompletionRequest,
+    raw_request: Request,
+    api_key: str = Depends(verify_api_key),
+    tmp_dir: Path = Depends(get_temp_dir),
+    image_store: Path = Depends(get_image_store_dir),
+):
+    base_url = str(raw_request.base_url)
+    pool, db = GeminiClientPool(), LMDBConversationStore()
+    try:
+        model = _get_model_by_name(request.model)
+    except ValueError as exc:
+        raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=str(exc)) from exc
+    if not request.messages:
+        raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="Messages required.")
+    structured_requirement = _build_structured_requirement(request.response_format)
+    extra_instr = [structured_requirement.instruction] if structured_requirement else None
+    # This ensures that server-injected system instructions are part of the history
+    msgs = _prepare_messages_for_model(
+        request.messages, request.tools, request.tool_choice, extra_instr
+    )
+    session, client, remain = await _find_reusable_session(db, pool, model, msgs)
+    if session:
+        if not remain:
+            raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="No new messages.")
+        # For reused sessions, we only need to process the remaining messages.
+        # We don't re-inject system defaults to avoid duplicating instructions already in history.
+        input_msgs = _prepare_messages_for_model(
+            remain, request.tools, request.tool_choice, extra_instr, False
+        )
+        if len(input_msgs) == 1:
+            m_input, files = await GeminiClientWrapper.process_message(
+                input_msgs[0], tmp_dir, tagged=False
+            )
+        else:
+            m_input, files = await GeminiClientWrapper.process_conversation(input_msgs, tmp_dir)
+        logger.debug(
+            f"Reused session {reprlib.repr(session.metadata)} - sending {len(input_msgs)} prepared messages."
+        )
+    else:
         try:
+            client = await pool.acquire()
+            session = client.start_chat(model=model)
+            # Use the already prepared 'msgs' for a fresh session
+            m_input, files = await GeminiClientWrapper.process_conversation(msgs, tmp_dir)
         except Exception as e:
+            logger.exception("Error in preparing conversation")
+            raise HTTPException(status_code=status.HTTP_503_SERVICE_UNAVAILABLE, detail=str(e))
+    completion_id = f"chatcmpl-{uuid.uuid4()}"
+    created_time = int(datetime.now(tz=timezone.utc).timestamp())
+    try:
+        assert session and client
+        logger.debug(
+            f"Client ID: {client.id}, Input length: {len(m_input)}, files count: {len(files)}"
+        )
+        resp_or_stream = await _send_with_split(
+            session, m_input, files=files, stream=request.stream
+        )
+    except Exception as e:
+        logger.exception("Gemini API error")
+        raise HTTPException(status_code=status.HTTP_502_BAD_GATEWAY, detail=str(e))
+    if request.stream:
+        return _create_real_streaming_response(
+            resp_or_stream,
+            completion_id,
+            created_time,
+            request.model,
+            msgs,  # Use prepared 'msgs'
+            db,
+            model,
+            client,
+            session,
+            base_url,
+            structured_requirement,
+        )
+    try:
+        raw_with_t = GeminiClientWrapper.extract_output(resp_or_stream, include_thoughts=True)
+        raw_clean = GeminiClientWrapper.extract_output(resp_or_stream, include_thoughts=False)
+    except Exception as exc:
+        logger.exception("Gemini output parsing failed.")
+        raise HTTPException(
+            status_code=status.HTTP_502_BAD_GATEWAY, detail="Malformed response."
+        ) from exc
+    visible_output, storage_output, tool_calls = _process_llm_output(
+        raw_with_t, raw_clean, structured_requirement
+    )
+    # Process images for OpenAI non-streaming flow
+    images = resp_or_stream.images or []
+    image_markdown = ""
+    seen_hashes = set()
+    for image in images:
+        try:
+            _, _, _, filename, file_hash = await _image_to_base64(image, image_store)
+            if file_hash in seen_hashes:
+                (image_store / filename).unlink(missing_ok=True)
+                continue
+            seen_hashes.add(file_hash)
+            img_url = (
+                f"![{filename}]({base_url}images/{filename}?token={get_image_token(filename)})"
             )
+            image_markdown += f"\n\n{img_url}"
+        except Exception as exc:
+            logger.warning(f"Failed to process image in OpenAI response: {exc}")
+    if image_markdown:
+        visible_output += image_markdown
+        storage_output += image_markdown
+    tool_calls_payload = [call.model_dump(mode="json") for call in tool_calls]
+    if tool_calls_payload:
+        logger.debug(f"Detected tool calls: {reprlib.repr(tool_calls_payload)}")
+    p_tok, c_tok, t_tok = _calculate_usage(request.messages, visible_output, tool_calls)
+    usage = {"prompt_tokens": p_tok, "completion_tokens": c_tok, "total_tokens": t_tok}
+    payload = _create_chat_completion_standard_payload(
+        completion_id,
+        created_time,
+        request.model,
+        visible_output,
+        tool_calls_payload,
+        "tool_calls" if tool_calls else "stop",
+        usage,
+    )
+    _persist_conversation(
+        db,
+        model.model_name,
+        client.id,
+        session.metadata,
+        msgs,  # Use prepared messages 'msgs'
+        storage_output,
+        tool_calls,
+    )
+    return payload
+@router.post("/v1/responses")
+async def create_response(
+    request: ResponseCreateRequest,
+    raw_request: Request,
+    api_key: str = Depends(verify_api_key),
+    tmp_dir: Path = Depends(get_temp_dir),
+    image_store: Path = Depends(get_image_store_dir),
+):
+    base_url = str(raw_request.base_url)
+    base_messages, norm_input = _response_items_to_messages(request.input)
+    struct_req = _build_structured_requirement(request.response_format)
+    extra_instr = [struct_req.instruction] if struct_req else []
+    standard_tools, image_tools = [], []
+    if request.tools:
+        for t in request.tools:
+            if isinstance(t, Tool):
+                standard_tools.append(t)
+            elif isinstance(t, ResponseImageTool):
+                image_tools.append(t)
+            elif isinstance(t, dict):
+                if t.get("type") == "function":
+                    standard_tools.append(Tool.model_validate(t))
+                elif t.get("type") == "image_generation":
+                    image_tools.append(ResponseImageTool.model_validate(t))
+    img_instr = _build_image_generation_instruction(
+        image_tools,
+        request.tool_choice if isinstance(request.tool_choice, ResponseToolChoice) else None,
+    )
+    if img_instr:
+        extra_instr.append(img_instr)
+    preface = _instructions_to_messages(request.instructions)
+    conv_messages = [*preface, *base_messages] if preface else base_messages
+    model_tool_choice = (
+        request.tool_choice if isinstance(request.tool_choice, (str, ToolChoiceFunction)) else None
+    )
+    messages = _prepare_messages_for_model(
+        conv_messages, standard_tools or None, model_tool_choice, extra_instr or None
+    )
+    pool, db = GeminiClientPool(), LMDBConversationStore()
+    try:
+        model = _get_model_by_name(request.model)
+    except ValueError as exc:
+        raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=str(exc)) from exc
+    session, client, remain = await _find_reusable_session(db, pool, model, messages)
+    if session:
+        msgs = _prepare_messages_for_model(remain, request.tools, request.tool_choice, None, False)
+        if not msgs:
+            raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="No new messages.")
+        m_input, files = (
+            await GeminiClientWrapper.process_message(msgs[0], tmp_dir, tagged=False)
+            if len(msgs) == 1
+            else await GeminiClientWrapper.process_conversation(msgs, tmp_dir)
+        )
+        logger.debug(
+            f"Reused session {reprlib.repr(session.metadata)} - sending {len(msgs)} prepared messages."
+        )
     else:
+        try:
+            client = await pool.acquire()
+            session = client.start_chat(model=model)
+            m_input, files = await GeminiClientWrapper.process_conversation(messages, tmp_dir)
+        except Exception as e:
+            logger.exception("Error in preparing conversation")
+            raise HTTPException(status_code=status.HTTP_503_SERVICE_UNAVAILABLE, detail=str(e))
+    response_id = f"resp_{uuid.uuid4().hex}"
+    created_time = int(datetime.now(tz=timezone.utc).timestamp())
+    try:
+        assert session and client
+        logger.debug(
+            f"Client ID: {client.id}, Input length: {len(m_input)}, files count: {len(files)}"
+        )
+        resp_or_stream = await _send_with_split(
+            session, m_input, files=files, stream=request.stream
+        )
+    except Exception as e:
+        logger.exception("Gemini API error")
+        raise HTTPException(status_code=status.HTTP_502_BAD_GATEWAY, detail=str(e))
+    if request.stream:
+        return _create_responses_real_streaming_response(
+            resp_or_stream,
+            response_id,
+            created_time,
+            request.model,
+            messages,
+            db,
+            model,
+            client,
+            session,
+            request,
+            image_store,
+            base_url,
+            struct_req,
+        )
+    try:
+        raw_t = GeminiClientWrapper.extract_output(resp_or_stream, include_thoughts=True)
+        raw_c = GeminiClientWrapper.extract_output(resp_or_stream, include_thoughts=False)
+    except Exception as exc:
+        logger.exception("Gemini parsing failed")
+        raise HTTPException(
+            status_code=status.HTTP_502_BAD_GATEWAY, detail="Malformed response."
+        ) from exc
+    assistant_text, storage_output, tool_calls = _process_llm_output(raw_t, raw_c, struct_req)
+    images = resp_or_stream.images or []
+    if (
+        request.tool_choice is not None and request.tool_choice.type == "image_generation"
+    ) and not images:
+        raise HTTPException(status_code=status.HTTP_502_BAD_GATEWAY, detail="No images returned.")
+    contents, img_calls = [], []
+    seen_hashes = set()
+    for img in images:
+        try:
+            b64, w, h, fname, fhash = await _image_to_base64(img, image_store)
+            if fhash in seen_hashes:
+                (image_store / fname).unlink(missing_ok=True)
+                continue
+            seen_hashes.add(fhash)
+            contents.append(
+                ResponseOutputContent(
+                    type="output_text",
+                    text=f"![{fname}]({base_url}images/{fname}?token={get_image_token(fname)})",
+                )
+            )
+            img_calls.append(
+                ResponseImageGenerationCall(
+                    id=fname.rsplit(".", 1)[0],
+                    result=b64,
+                    output_format="png" if isinstance(img, GeneratedImage) else "jpeg",
+                    size=f"{w}x{h}" if w and h else None,
+                )
+            )
+        except Exception as e:
+            logger.warning(f"Image error: {e}")
+    if assistant_text:
+        contents.append(ResponseOutputContent(type="output_text", text=assistant_text))
+    if not contents:
+        contents.append(ResponseOutputContent(type="output_text", text=""))
+    # Aggregate images for storage
+    image_markdown = ""
+    for img_call in img_calls:
+        fname = f"{img_call.id}.{img_call.output_format}"
+        img_url = f"![{fname}]({base_url}images/{fname}?token={get_image_token(fname)})"
+        image_markdown += f"\n\n{img_url}"
+    if image_markdown:
+        storage_output += image_markdown
+    p_tok, c_tok, t_tok = _calculate_usage(messages, assistant_text, tool_calls)
+    usage = ResponseUsage(input_tokens=p_tok, output_tokens=c_tok, total_tokens=t_tok)
+    payload = _create_responses_standard_payload(
+        response_id,
+        created_time,
+        request.model,
+        tool_calls,
+        img_calls,
+        contents,
+        usage,
+        request,
+        norm_input,
+    )
+    _persist_conversation(
+        db, model.model_name, client.id, session.metadata, messages, storage_output, tool_calls
+    )
+    return payload

app/services/client.py CHANGED Viewed

@@ -78,24 +78,20 @@ class GeminiClientWrapper(GeminiClient):
         message: Message, tempdir: Path | None = None, tagged: bool = True
     ) -> tuple[str, list[Path | str]]:
         """
-        Process a single message and return model input.
         """
         files: list[Path | str] = []
         text_fragments: list[str] = []
         if isinstance(message.content, str):
-            # Pure text content
-            if message.content:
-                text_fragments.append(message.content)
         elif isinstance(message.content, list):
-            # Mixed content
-            # TODO: Use Pydantic to enforce the value checking
             for item in message.content:
                 if item.type == "text":
-                    # Append multiple text fragments
-                    if item.text:
-                        text_fragments.append(item.text)
                 elif item.type == "image_url":
                     if not item.image_url:
                         raise ValueError("Image URL cannot be empty")
@@ -103,7 +99,6 @@ class GeminiClientWrapper(GeminiClient):
                         files.append(await save_url_to_tempfile(url, tempdir))
                     else:
                         raise ValueError("Image URL must contain 'url' key")
                 elif item.type == "file":
                     if not item.file:
                         raise ValueError("File cannot be empty")
@@ -114,18 +109,28 @@ class GeminiClientWrapper(GeminiClient):
                         files.append(await save_url_to_tempfile(url, tempdir))
                     else:
                         raise ValueError("File must contain 'file_data' or 'url' key")
         elif message.content is not None:
             raise ValueError("Unsupported message content type.")
         if message.tool_calls:
             tool_blocks: list[str] = []
             for call in message.tool_calls:
                 args_text = call.function.arguments.strip()
                 try:
                     parsed_args = orjson.loads(args_text)
-                    args_text = orjson.dumps(parsed_args).decode("utf-8")
                 except orjson.JSONDecodeError:
-                    # Leave args_text as is if it is not valid JSON
                     pass
                 tool_blocks.append(
                     f'<tool_call name="{call.function.name}">{args_text}</tool_call>'
@@ -135,10 +140,9 @@ class GeminiClientWrapper(GeminiClient):
                 tool_section = "```xml\n" + "".join(tool_blocks) + "\n```"
                 text_fragments.append(tool_section)
-        model_input = "\n".join(fragment for fragment in text_fragments if fragment)
-        # Add role tag if needed
-        if model_input:
             if tagged:
                 model_input = add_tag(message.role, model_input)
@@ -148,48 +152,29 @@ class GeminiClientWrapper(GeminiClient):
     async def process_conversation(
         messages: list[Message], tempdir: Path | None = None
     ) -> tuple[str, list[Path | str]]:
-        """
-        Process the entire conversation and return a formatted string and list of
-        files. The last message is assumed to be the assistant's response.
-        """
-        # Determine once whether we need to wrap messages with role tags: only required
-        # if the history already contains assistant/system messages. When every message
-        # so far is from the user, we can skip tagging entirely.
         need_tag = any(m.role != "user" for m in messages)
         conversation: list[str] = []
         files: list[Path | str] = []
         for msg in messages:
             input_part, files_part = await GeminiClientWrapper.process_message(
                 msg, tempdir, tagged=need_tag
             )
             conversation.append(input_part)
             files.extend(files_part)
-        # Append an opening assistant tag only when we used tags above so that Gemini
-        # knows where to start its reply.
         if need_tag:
             conversation.append(add_tag("assistant", "", unclose=True))
         return "\n".join(conversation), files
     @staticmethod
     def extract_output(response: ModelOutput, include_thoughts: bool = True) -> str:
-        """
-        Extract and format the output text from the Gemini response.
-        """
         text = ""
         if include_thoughts and response.thoughts:
             text += f"<think>{response.thoughts}</think>\n"
         if response.text:
             text += response.text
         else:
             text += str(response)
-        # Fix some escaped characters
         def _unescape_html(text_content: str) -> str:
             parts: list[str] = []
             last_index = 0

         message: Message, tempdir: Path | None = None, tagged: bool = True
     ) -> tuple[str, list[Path | str]]:
         """
+        Process a single Message object into a format suitable for the Gemini API.
+        Extracts text fragments, handles images and files, and appends tool call blocks if present.
         """
         files: list[Path | str] = []
         text_fragments: list[str] = []
         if isinstance(message.content, str):
+            if message.content or message.role == "tool":
+                text_fragments.append(message.content or "{}")
         elif isinstance(message.content, list):
             for item in message.content:
                 if item.type == "text":
+                    if item.text or message.role == "tool":
+                        text_fragments.append(item.text or "{}")
                 elif item.type == "image_url":
                     if not item.image_url:
                         raise ValueError("Image URL cannot be empty")
                         files.append(await save_url_to_tempfile(url, tempdir))
                     else:
                         raise ValueError("Image URL must contain 'url' key")
                 elif item.type == "file":
                     if not item.file:
                         raise ValueError("File cannot be empty")
                         files.append(await save_url_to_tempfile(url, tempdir))
                     else:
                         raise ValueError("File must contain 'file_data' or 'url' key")
+        elif message.content is None and message.role == "tool":
+            text_fragments.append("{}")
         elif message.content is not None:
             raise ValueError("Unsupported message content type.")
+        if message.role == "tool":
+            tool_name = message.name or "unknown"
+            combined_content = "\n".join(text_fragments).strip() or "{}"
+            text_fragments = [
+                f'<tool_response name="{tool_name}">{combined_content}</tool_response>'
+            ]
         if message.tool_calls:
             tool_blocks: list[str] = []
             for call in message.tool_calls:
                 args_text = call.function.arguments.strip()
                 try:
                     parsed_args = orjson.loads(args_text)
+                    args_text = orjson.dumps(parsed_args, option=orjson.OPT_SORT_KEYS).decode(
+                        "utf-8"
+                    )
                 except orjson.JSONDecodeError:
                     pass
                 tool_blocks.append(
                     f'<tool_call name="{call.function.name}">{args_text}</tool_call>'
                 tool_section = "```xml\n" + "".join(tool_blocks) + "\n```"
                 text_fragments.append(tool_section)
+        model_input = "\n".join(fragment for fragment in text_fragments if fragment is not None)
+        if model_input or message.role == "tool":
             if tagged:
                 model_input = add_tag(message.role, model_input)
     async def process_conversation(
         messages: list[Message], tempdir: Path | None = None
     ) -> tuple[str, list[Path | str]]:
         need_tag = any(m.role != "user" for m in messages)
         conversation: list[str] = []
         files: list[Path | str] = []
         for msg in messages:
             input_part, files_part = await GeminiClientWrapper.process_message(
                 msg, tempdir, tagged=need_tag
             )
             conversation.append(input_part)
             files.extend(files_part)
         if need_tag:
             conversation.append(add_tag("assistant", "", unclose=True))
         return "\n".join(conversation), files
     @staticmethod
     def extract_output(response: ModelOutput, include_thoughts: bool = True) -> str:
         text = ""
         if include_thoughts and response.thoughts:
             text += f"<think>{response.thoughts}</think>\n"
         if response.text:
             text += response.text
         else:
             text += str(response)
         def _unescape_html(text_content: str) -> str:
             parts: list[str] = []
             last_index = 0

app/services/lmdb.py CHANGED Viewed

@@ -11,45 +11,82 @@ from loguru import logger
 from ..models import ContentItem, ConversationInStore, Message
 from ..utils import g_config
-from ..utils.helper import extract_tool_calls, remove_tool_call_blocks
 from ..utils.singleton import Singleton
 def _hash_message(message: Message) -> str:
-    """Generate a consistent hash for a single message focusing ONLY on logic/content, ignoring technical IDs."""
     core_data = {
         "role": message.role,
         "name": message.name,
     }
-    # Normalize content: strip, handle empty/None, and list-of-text items
     content = message.content
     if not content:
         core_data["content"] = None
     elif isinstance(content, str):
-        # Normalize line endings and strip whitespace
-        normalized = content.replace("\r\n", "\n").strip()
         core_data["content"] = normalized if normalized else None
     elif isinstance(content, list):
         text_parts = []
         for item in content:
             if isinstance(item, ContentItem) and item.type == "text":
-                text_parts.append(item.text or "")
             elif isinstance(item, dict) and item.get("type") == "text":
-                text_parts.append(item.get("text") or "")
             else:
-                # If it contains non-text (images/files), keep the full list for hashing
-                text_parts = None
-                break
-        if text_parts is not None:
-            # Normalize each part but keep them as a list to preserve boundaries and avoid collisions
-            normalized_parts = [p.replace("\r\n", "\n") for p in text_parts]
-            core_data["content"] = normalized_parts if normalized_parts else None
-        else:
-            core_data["content"] = message.model_dump(mode="json")["content"]
-    # Normalize tool_calls: Focus ONLY on function name and arguments
     if message.tool_calls:
         calls_data = []
         for tc in message.tool_calls:
@@ -66,14 +103,14 @@ def _hash_message(message: Message) -> str:
                     "arguments": canon_args,
                 }
             )
-        # Sort calls to be order-independent
         calls_data.sort(key=lambda x: (x["name"], x["arguments"]))
         core_data["tool_calls"] = calls_data
     else:
         core_data["tool_calls"] = None
     message_bytes = orjson.dumps(core_data, option=orjson.OPT_SORT_KEYS)
-    return hashlib.sha256(message_bytes).hexdigest()
 def _hash_conversation(client_id: str, model: str, messages: List[Message]) -> str:
@@ -123,16 +160,14 @@ class LMDBConversationStore(metaclass=Singleton):
         self._init_environment()
     def _ensure_db_path(self) -> None:
-        """Ensure database directory exists."""
         self.db_path.parent.mkdir(parents=True, exist_ok=True)
     def _init_environment(self) -> None:
-        """Initialize LMDB environment."""
         try:
             self._env = lmdb.open(
                 str(self.db_path),
                 map_size=self.max_db_size,
-                max_dbs=3,  # main, metadata, and index databases
                 writemap=True,
                 readahead=False,
                 meminit=False,
@@ -144,7 +179,6 @@ class LMDBConversationStore(metaclass=Singleton):
     @contextmanager
     def _get_transaction(self, write: bool = False):
-        """Get LMDB transaction context manager."""
         if not self._env:
             raise RuntimeError("LMDB environment not initialized")
@@ -178,11 +212,15 @@ class LMDBConversationStore(metaclass=Singleton):
         if not conv:
             raise ValueError("Messages list cannot be empty")
         # Generate hash for the message list
         message_hash = _hash_conversation(conv.client_id, conv.model, conv.messages)
         storage_key = custom_key or message_hash
-        # Prepare data for storage
         now = datetime.now()
         if conv.created_at is None:
             conv.created_at = now
@@ -192,20 +230,18 @@ class LMDBConversationStore(metaclass=Singleton):
         try:
             with self._get_transaction(write=True) as txn:
-                # Store main data
                 txn.put(storage_key.encode("utf-8"), value, overwrite=True)
-                # Store hash -> key mapping for reverse lookup
                 txn.put(
                     f"{self.HASH_LOOKUP_PREFIX}{message_hash}".encode("utf-8"),
                     storage_key.encode("utf-8"),
                 )
-                logger.debug(f"Stored {len(conv.messages)} messages with key: {storage_key}")
                 return storage_key
         except Exception as e:
-            logger.error(f"Failed to store conversation: {e}")
             raise
     def get(self, key: str) -> Optional[ConversationInStore]:
@@ -227,39 +263,35 @@ class LMDBConversationStore(metaclass=Singleton):
                 storage_data = orjson.loads(data)  # type: ignore
                 conv = ConversationInStore.model_validate(storage_data)
-                logger.debug(f"Retrieved {len(conv.messages)} messages for key: {key}")
                 return conv
         except Exception as e:
-            logger.error(f"Failed to retrieve messages for key {key}: {e}")
             return None
     def find(self, model: str, messages: List[Message]) -> Optional[ConversationInStore]:
         """
         Search conversation data by message list.
-        Args:
-            model: Model name of the conversations
-            messages: List of messages to search for
-        Returns:
-            Conversation or None if not found
         """
         if not messages:
             return None
         # --- Find with raw messages ---
         if conv := self._find_by_message_list(model, messages):
-            logger.debug("Found conversation with raw message history.")
             return conv
         # --- Find with cleaned messages ---
         cleaned_messages = self.sanitize_assistant_messages(messages)
-        if conv := self._find_by_message_list(model, cleaned_messages):
-            logger.debug("Found conversation with cleaned message history.")
-            return conv
-        logger.debug("No conversation found for either raw or cleaned history.")
         return None
     def _find_by_message_list(
@@ -330,11 +362,11 @@ class LMDBConversationStore(metaclass=Singleton):
                 if message_hash and key != message_hash:
                     txn.delete(f"{self.HASH_LOOKUP_PREFIX}{message_hash}".encode("utf-8"))
-                logger.debug(f"Deleted messages with key: {key}")
                 return conv
         except Exception as e:
-            logger.error(f"Failed to delete key {key}: {e}")
             return None
     def keys(self, prefix: str = "", limit: Optional[int] = None) -> List[str]:
@@ -478,6 +510,8 @@ class LMDBConversationStore(metaclass=Singleton):
         """
         Remove all <think>...</think> tags and strip whitespace.
         """
         # Remove all think blocks anywhere in the text
         cleaned_content = re.sub(r"<think>.*?</think>", "", text, flags=re.DOTALL)
         return cleaned_content.strip()
@@ -485,12 +519,8 @@ class LMDBConversationStore(metaclass=Singleton):
     @staticmethod
     def sanitize_assistant_messages(messages: list[Message]) -> list[Message]:
         """
-        Create a new list of messages with assistant content cleaned of <think> tags
-        and system hints/tool call blocks. This is used for both storing and
-        searching chat history to ensure consistency.
-        If a message has no tool_calls but contains tool call XML blocks in its
-        content, they will be extracted and moved to the tool_calls field.
         """
         cleaned_messages = []
         for msg in messages:
@@ -503,12 +533,12 @@ class LMDBConversationStore(metaclass=Singleton):
                     else:
                         text = remove_tool_call_blocks(text).strip()
-                    normalized_content = text.strip()
                     if normalized_content != msg.content or tool_calls != msg.tool_calls:
                         cleaned_msg = msg.model_copy(
                             update={
-                                "content": normalized_content or None,
                                 "tool_calls": tool_calls or None,
                             }
                         )

 from ..models import ContentItem, ConversationInStore, Message
 from ..utils import g_config
+from ..utils.helper import (
+    extract_tool_calls,
+    remove_tool_call_blocks,
+    strip_system_hints,
+)
 from ..utils.singleton import Singleton
 def _hash_message(message: Message) -> str:
+    """
+    Generate a stable, canonical hash for a single message.
+    Strips system hints, thoughts, and tool call blocks to ensure
+    identical logical content produces the same hash regardless of format.
+    """
     core_data = {
         "role": message.role,
         "name": message.name,
+        "tool_call_id": message.tool_call_id,
     }
     content = message.content
     if not content:
         core_data["content"] = None
     elif isinstance(content, str):
+        normalized = content.replace("\r\n", "\n")
+        normalized = LMDBConversationStore.remove_think_tags(normalized)
+        normalized = strip_system_hints(normalized)
+        if message.tool_calls:
+            normalized = remove_tool_call_blocks(normalized)
+        else:
+            temp_text, _extracted = extract_tool_calls(normalized)
+            normalized = temp_text
+        normalized = normalized.strip()
         core_data["content"] = normalized if normalized else None
     elif isinstance(content, list):
         text_parts = []
         for item in content:
+            text_val = ""
             if isinstance(item, ContentItem) and item.type == "text":
+                text_val = item.text or ""
             elif isinstance(item, dict) and item.get("type") == "text":
+                text_val = item.get("text") or ""
+            if text_val:
+                text_val = text_val.replace("\r\n", "\n")
+                text_val = LMDBConversationStore.remove_think_tags(text_val)
+                text_val = strip_system_hints(text_val)
+                text_val = remove_tool_call_blocks(text_val).strip()
+                if text_val:
+                    text_parts.append(text_val)
+            elif isinstance(item, ContentItem) and item.type in ("image_url", "file"):
+                # For non-text items, include their unique markers to distinguish them
+                if item.type == "image_url":
+                    text_parts.append(
+                        f"[image_url:{item.image_url.get('url') if item.image_url else ''}]"
+                    )
+                elif item.type == "file":
+                    text_parts.append(
+                        f"[file:{item.file.get('url') or item.file.get('filename') if item.file else ''}]"
+                    )
             else:
+                # Fallback for other dict-based content parts
+                part_type = item.get("type") if isinstance(item, dict) else None
+                if part_type == "image_url":
+                    url = item.get("image_url", {}).get("url")
+                    text_parts.append(f"[image_url:{url}]")
+                elif part_type == "file":
+                    url = item.get("file", {}).get("url") or item.get("file", {}).get("filename")
+                    text_parts.append(f"[file:{url}]")
+        combined_text = "\n".join(text_parts).replace("\r\n", "\n").strip()
+        core_data["content"] = combined_text if combined_text else None
     if message.tool_calls:
         calls_data = []
         for tc in message.tool_calls:
                     "arguments": canon_args,
                 }
             )
         calls_data.sort(key=lambda x: (x["name"], x["arguments"]))
         core_data["tool_calls"] = calls_data
     else:
         core_data["tool_calls"] = None
     message_bytes = orjson.dumps(core_data, option=orjson.OPT_SORT_KEYS)
+    digest = hashlib.sha256(message_bytes).hexdigest()
+    return digest
 def _hash_conversation(client_id: str, model: str, messages: List[Message]) -> str:
         self._init_environment()
     def _ensure_db_path(self) -> None:
         self.db_path.parent.mkdir(parents=True, exist_ok=True)
     def _init_environment(self) -> None:
         try:
             self._env = lmdb.open(
                 str(self.db_path),
                 map_size=self.max_db_size,
+                max_dbs=3,
                 writemap=True,
                 readahead=False,
                 meminit=False,
     @contextmanager
     def _get_transaction(self, write: bool = False):
         if not self._env:
             raise RuntimeError("LMDB environment not initialized")
         if not conv:
             raise ValueError("Messages list cannot be empty")
+        # Sanitize messages before computing hash and storing to ensure consistency
+        # with the search (find) logic, which also sanitizes its prefix.
+        sanitized_messages = self.sanitize_assistant_messages(conv.messages)
+        conv.messages = sanitized_messages
         # Generate hash for the message list
         message_hash = _hash_conversation(conv.client_id, conv.model, conv.messages)
         storage_key = custom_key or message_hash
         now = datetime.now()
         if conv.created_at is None:
             conv.created_at = now
         try:
             with self._get_transaction(write=True) as txn:
                 txn.put(storage_key.encode("utf-8"), value, overwrite=True)
                 txn.put(
                     f"{self.HASH_LOOKUP_PREFIX}{message_hash}".encode("utf-8"),
                     storage_key.encode("utf-8"),
                 )
+                logger.debug(f"Stored {len(conv.messages)} messages with key: {storage_key[:12]}")
                 return storage_key
         except Exception as e:
+            logger.error(f"Failed to store messages with key {storage_key[:12]}: {e}")
             raise
     def get(self, key: str) -> Optional[ConversationInStore]:
                 storage_data = orjson.loads(data)  # type: ignore
                 conv = ConversationInStore.model_validate(storage_data)
+                logger.debug(f"Retrieved {len(conv.messages)} messages with key: {key[:12]}")
                 return conv
         except Exception as e:
+            logger.error(f"Failed to retrieve messages with key {key[:12]}: {e}")
             return None
     def find(self, model: str, messages: List[Message]) -> Optional[ConversationInStore]:
         """
         Search conversation data by message list.
         """
         if not messages:
             return None
         # --- Find with raw messages ---
         if conv := self._find_by_message_list(model, messages):
+            logger.debug(f"Session found for '{model}' with {len(messages)} raw messages.")
             return conv
         # --- Find with cleaned messages ---
         cleaned_messages = self.sanitize_assistant_messages(messages)
+        if cleaned_messages != messages:
+            if conv := self._find_by_message_list(model, cleaned_messages):
+                logger.debug(
+                    f"Session found for '{model}' with {len(cleaned_messages)} cleaned messages."
+                )
+                return conv
+        logger.debug(f"No session found for '{model}' with {len(messages)} messages.")
         return None
     def _find_by_message_list(
                 if message_hash and key != message_hash:
                     txn.delete(f"{self.HASH_LOOKUP_PREFIX}{message_hash}".encode("utf-8"))
+                logger.debug(f"Deleted messages with key: {key[:12]}")
                 return conv
         except Exception as e:
+            logger.error(f"Failed to delete messages with key {key[:12]}: {e}")
             return None
     def keys(self, prefix: str = "", limit: Optional[int] = None) -> List[str]:
         """
         Remove all <think>...</think> tags and strip whitespace.
         """
+        if not text:
+            return text
         # Remove all think blocks anywhere in the text
         cleaned_content = re.sub(r"<think>.*?</think>", "", text, flags=re.DOTALL)
         return cleaned_content.strip()
     @staticmethod
     def sanitize_assistant_messages(messages: list[Message]) -> list[Message]:
         """
+        Produce a canonical history where assistant messages are cleaned of
+        internal markers and tool call blocks are moved to metadata.
         """
         cleaned_messages = []
         for msg in messages:
                     else:
                         text = remove_tool_call_blocks(text).strip()
+                    normalized_content = text.strip() or None
                     if normalized_content != msg.content or tool_calls != msg.tool_calls:
                         cleaned_msg = msg.model_copy(
                             update={
+                                "content": normalized_content,
                                 "tool_calls": tool_calls or None,
                             }
                         )

app/services/pool.py CHANGED Viewed

@@ -31,7 +31,7 @@ class GeminiClientPool(metaclass=Singleton):
             self._clients.append(client)
             self._id_map[c.id] = client
             self._round_robin.append(client)
-            self._restart_locks[c.id] = asyncio.Lock()  # Pre-initialize
     async def init(self) -> None:
         """Initialize all clients in the pool."""
@@ -84,7 +84,7 @@ class GeminiClientPool(metaclass=Singleton):
         lock = self._restart_locks.get(client.id)
         if lock is None:
-            return False  # Should not happen
         async with lock:
             if client.running():

             self._clients.append(client)
             self._id_map[c.id] = client
             self._round_robin.append(client)
+            self._restart_locks[c.id] = asyncio.Lock()
     async def init(self) -> None:
         """Initialize all clients in the pool."""
         lock = self._restart_locks.get(client.id)
         if lock is None:
+            return False
         async with lock:
             if client.running():

app/utils/helper.py CHANGED Viewed

@@ -5,7 +5,6 @@ import re
 import struct
 import tempfile
 from pathlib import Path
-from typing import Iterator
 from urllib.parse import urlparse
 import httpx
@@ -68,7 +67,6 @@ async def save_url_to_tempfile(url: str, tempdir: Path | None = None) -> Path:
     data: bytes | None = None
     suffix: str | None = None
     if url.startswith("data:image/"):
-        # Base64 encoded image
         metadata_part = url.split(",")[0]
         mime_type = metadata_part.split(":")[1].split(";")[0]
@@ -112,9 +110,9 @@ def strip_code_fence(text: str) -> str:
 def strip_tagged_blocks(text: str) -> str:
-    """Remove <|im_start|>role ... <|im_end|> sections, dropping tool blocks entirely.
-    - tool blocks are removed entirely (if missing end marker, drop to EOF).
-    - other roles: remove markers and role, keep inner content (if missing end marker, keep to EOF).
     """
     if not text:
         return text
@@ -131,13 +129,11 @@ def strip_tagged_blocks(text: str) -> str:
             result.append(text[idx:])
             break
-        # append any content before this block
         result.append(text[idx:start])
         role_start = start + len(start_marker)
         newline = text.find("\n", role_start)
         if newline == -1:
-            # malformed block; keep the remainder as-is (safe behavior)
             result.append(text[start:])
             break
@@ -145,23 +141,18 @@ def strip_tagged_blocks(text: str) -> str:
         end = text.find(end_marker, newline + 1)
         if end == -1:
-            # missing end marker
             if role == "tool":
-                # drop from the start marker to EOF (skip the remainder)
                 break
             else:
-                # keep inner content from after the role newline to EOF
                 result.append(text[newline + 1 :])
                 break
         block_end = end + len(end_marker)
         if role == "tool":
-            # drop the whole block
             idx = block_end
             continue
-        # keep the content without role markers
         content = text[newline + 1 : end]
         result.append(content)
         idx = block_end
@@ -180,41 +171,19 @@ def strip_system_hints(text: str) -> str:
     return cleaned.strip()
-def remove_tool_call_blocks(text: str) -> str:
-    """Strip tool call code blocks from text."""
-    if not text:
-        return text
-    # 1. Remove fenced blocks ONLY if they contain tool calls
-    def _replace_block(match: re.Match[str]) -> str:
-        block_content = match.group(1)
-        if not block_content:
-            return match.group(0)
-        # Check if the block contains any tool call tag
-        if TOOL_CALL_RE.search(block_content):
-            return ""
-        # Preserve the block if no tool call found
-        return match.group(0)
-    cleaned = TOOL_BLOCK_RE.sub(_replace_block, text)
-    # 2. Remove orphaned tool calls
-    cleaned = TOOL_CALL_RE.sub("", cleaned)
-    return strip_system_hints(cleaned)
-def extract_tool_calls(text: str) -> tuple[str, list[ToolCall]]:
-    """Extract tool call definitions and return cleaned text."""
     if not text:
         return text, []
     tool_calls: list[ToolCall] = []
     def _create_tool_call(name: str, raw_args: str) -> None:
-        """Helper to parse args and append to the tool_calls list."""
         if not name:
             logger.warning("Encountered tool_call without a function name.")
             return
@@ -226,8 +195,6 @@ def extract_tool_calls(text: str) -> tuple[str, list[ToolCall]]:
         except orjson.JSONDecodeError:
             logger.warning(f"Failed to parse tool call arguments for '{name}'. Passing raw string.")
-        # Generate a deterministic ID based on name, arguments, and its global sequence index
-        # to ensure uniqueness across multiple fenced blocks while remaining stable for storage.
         index = len(tool_calls)
         seed = f"{name}:{arguments}:{index}".encode("utf-8")
         call_id = f"call_{hashlib.sha256(seed).hexdigest()[:24]}"
@@ -245,14 +212,14 @@ def extract_tool_calls(text: str) -> tuple[str, list[ToolCall]]:
         if not block_content:
             return match.group(0)
-        found_in_block = False
-        for call_match in TOOL_CALL_RE.finditer(block_content):
-            found_in_block = True
-            name = (call_match.group(1) or "").strip()
-            raw_args = (call_match.group(2) or "").strip()
-            _create_tool_call(name, raw_args)
-        if found_in_block:
             return ""
         else:
             return match.group(0)
@@ -260,56 +227,26 @@ def extract_tool_calls(text: str) -> tuple[str, list[ToolCall]]:
     cleaned = TOOL_BLOCK_RE.sub(_replace_block, text)
     def _replace_orphan(match: re.Match[str]) -> str:
-        name = (match.group(1) or "").strip()
-        raw_args = (match.group(2) or "").strip()
-        _create_tool_call(name, raw_args)
         return ""
     cleaned = TOOL_CALL_RE.sub(_replace_orphan, cleaned)
     cleaned = strip_system_hints(cleaned)
     return cleaned, tool_calls
-def iter_stream_segments(model_output: str, chunk_size: int = 64) -> Iterator[str]:
-    """Yield stream segments while keeping <think> markers and words intact."""
-    if not model_output:
-        return
-    token_pattern = re.compile(r"\s+|\S+\s*")
-    pending = ""
-    def _flush_pending() -> Iterator[str]:
-        nonlocal pending
-        if pending:
-            yield pending
-            pending = ""
-    # Split on <think> boundaries so the markers are never fragmented.
-    parts = re.split(r"(</?think>)", model_output)
-    for part in parts:
-        if not part:
-            continue
-        if part in {"<think>", "</think>"}:
-            yield from _flush_pending()
-            yield part
-            continue
-        for match in token_pattern.finditer(part):
-            token = match.group(0)
-            if len(token) > chunk_size:
-                yield from _flush_pending()
-                for idx in range(0, len(token), chunk_size):
-                    yield token[idx : idx + chunk_size]
-                continue
-            if pending and len(pending) + len(token) > chunk_size:
-                yield from _flush_pending()
-            pending += token
-    yield from _flush_pending()
 def text_from_message(message: Message) -> str:

 import struct
 import tempfile
 from pathlib import Path
 from urllib.parse import urlparse
 import httpx
     data: bytes | None = None
     suffix: str | None = None
     if url.startswith("data:image/"):
         metadata_part = url.split(",")[0]
         mime_type = metadata_part.split(":")[1].split(";")[0]
 def strip_tagged_blocks(text: str) -> str:
+    """Remove <|im_start|>role ... <|im_end|> sections.
+    - tool blocks are removed entirely (including content).
+    - other roles: remove markers and role, keep inner content.
     """
     if not text:
         return text
             result.append(text[idx:])
             break
         result.append(text[idx:start])
         role_start = start + len(start_marker)
         newline = text.find("\n", role_start)
         if newline == -1:
             result.append(text[start:])
             break
         end = text.find(end_marker, newline + 1)
         if end == -1:
             if role == "tool":
                 break
             else:
                 result.append(text[newline + 1 :])
                 break
         block_end = end + len(end_marker)
         if role == "tool":
             idx = block_end
             continue
         content = text[newline + 1 : end]
         result.append(content)
         idx = block_end
     return cleaned.strip()
+def _process_tools_internal(text: str, extract: bool = True) -> tuple[str, list[ToolCall]]:
+    """
+    Unified engine for stripping tool call blocks and extracting tool metadata.
+    If extract=True, parses JSON arguments and assigns deterministic call IDs.
+    """
     if not text:
         return text, []
     tool_calls: list[ToolCall] = []
     def _create_tool_call(name: str, raw_args: str) -> None:
+        if not extract:
+            return
         if not name:
             logger.warning("Encountered tool_call without a function name.")
             return
         except orjson.JSONDecodeError:
             logger.warning(f"Failed to parse tool call arguments for '{name}'. Passing raw string.")
         index = len(tool_calls)
         seed = f"{name}:{arguments}:{index}".encode("utf-8")
         call_id = f"call_{hashlib.sha256(seed).hexdigest()[:24]}"
         if not block_content:
             return match.group(0)
+        is_tool_block = bool(TOOL_CALL_RE.search(block_content))
+        if is_tool_block:
+            if extract:
+                for call_match in TOOL_CALL_RE.finditer(block_content):
+                    name = (call_match.group(1) or "").strip()
+                    raw_args = (call_match.group(2) or "").strip()
+                    _create_tool_call(name, raw_args)
             return ""
         else:
             return match.group(0)
     cleaned = TOOL_BLOCK_RE.sub(_replace_block, text)
     def _replace_orphan(match: re.Match[str]) -> str:
+        if extract:
+            name = (match.group(1) or "").strip()
+            raw_args = (match.group(2) or "").strip()
+            _create_tool_call(name, raw_args)
         return ""
     cleaned = TOOL_CALL_RE.sub(_replace_orphan, cleaned)
     cleaned = strip_system_hints(cleaned)
     return cleaned, tool_calls
+def remove_tool_call_blocks(text: str) -> str:
+    """Strip tool call code blocks from text."""
+    cleaned, _ = _process_tools_internal(text, extract=False)
+    return cleaned
+def extract_tool_calls(text: str) -> tuple[str, list[ToolCall]]:
+    """Extract tool call definitions and return cleaned text."""
+    return _process_tools_internal(text, extract=True)
 def text_from_message(message: Message) -> str:

pyproject.toml CHANGED Viewed

@@ -6,10 +6,10 @@ readme = "README.md"
 requires-python = "==3.12.*"
 dependencies = [
     "fastapi>=0.128.0",
-    "gemini-webapi>=1.17.3",
     "lmdb>=1.7.5",
     "loguru>=0.7.3",
-    "orjson>=3.11.5",
     "pydantic-settings[yaml]>=2.12.0",
     "uvicorn>=0.40.0",
     "uvloop>=0.22.1; sys_platform != 'win32'",

 requires-python = "==3.12.*"
 dependencies = [
     "fastapi>=0.128.0",
+    "gemini-webapi>=1.18.0",
     "lmdb>=1.7.5",
     "loguru>=0.7.3",
+    "orjson>=3.11.7",
     "pydantic-settings[yaml]>=2.12.0",
     "uvicorn>=0.40.0",
     "uvloop>=0.22.1; sys_platform != 'win32'",

uv.lock CHANGED Viewed

@@ -106,10 +106,10 @@ dev = [
 [package.metadata]
 requires-dist = [
     { name = "fastapi", specifier = ">=0.128.0" },
-    { name = "gemini-webapi", specifier = ">=1.17.3" },
     { name = "lmdb", specifier = ">=1.7.5" },
     { name = "loguru", specifier = ">=0.7.3" },
-    { name = "orjson", specifier = ">=3.11.5" },
     { name = "pydantic-settings", extras = ["yaml"], specifier = ">=2.12.0" },
     { name = "ruff", marker = "extra == 'dev'", specifier = ">=0.14.14" },
     { name = "uvicorn", specifier = ">=0.40.0" },
@@ -122,17 +122,17 @@ dev = [{ name = "ruff", specifier = ">=0.14.14" }]
 [[package]]
 name = "gemini-webapi"
-version = "1.17.3"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
-    { name = "httpx" },
     { name = "loguru" },
     { name = "orjson" },
     { name = "pydantic" },
 ]
-sdist = { url = "https://files.pythonhosted.org/packages/aa/74/1a31f3605250eb5cbcbfb15559c43b0d71734c8d286cfa9a7833841306e3/gemini_webapi-1.17.3.tar.gz", hash = "sha256:6201f9eaf5f562c5dc589d71c0edbba9e2eb8f780febbcf35307697bf474d577", size = 259418, upload-time = "2025-12-05T22:38:44.426Z" }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/4c/a3/a88ff45197dce68a81d92c8d40368e4c26f67faf3af3273357f3f71f5c3d/gemini_webapi-1.17.3-py3-none-any.whl", hash = "sha256:d83969b1fa3236f3010d856d191b35264c936ece81f1be4c1de53ec1cf0855c8", size = 56659, upload-time = "2025-12-05T22:38:42.93Z" },
 ]
 [[package]]
@@ -144,6 +144,28 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/04/4b/29cac41a4d98d144bf5f6d33995617b185d14b22401f75ca86f384e87ff1/h11-0.16.0-py3-none-any.whl", hash = "sha256:63cf8bbe7522de3bf65932fda1d9c2772064ffb3dae62d55932da54b31cb6c86", size = 37515, upload-time = "2025-04-24T03:35:24.344Z" },
 ]
 [[package]]
 name = "httpcore"
 version = "1.0.9"
@@ -172,6 +194,20 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/2a/39/e50c7c3a983047577ee07d2a9e53faf5a69493943ec3f6a384bdc792deb2/httpx-0.28.1-py3-none-any.whl", hash = "sha256:d909fcccc110f8c7faf814ca82a9a4d816bc5a6dbfea25d6591d6985b8ba59ad", size = 73517, upload-time = "2024-12-06T15:37:21.509Z" },
 ]
 [[package]]
 name = "idna"
 version = "3.11"
@@ -211,25 +247,25 @@ wheels = [
 [[package]]
 name = "orjson"
-version = "3.11.5"
 source = { registry = "https://pypi.org/simple" }
-sdist = { url = "https://files.pythonhosted.org/packages/04/b8/333fdb27840f3bf04022d21b654a35f58e15407183aeb16f3b41aa053446/orjson-3.11.5.tar.gz", hash = "sha256:82393ab47b4fe44ffd0a7659fa9cfaacc717eb617c93cde83795f14af5c2e9d5", size = 5972347, upload-time = "2025-12-06T15:55:39.458Z" }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/ef/a4/8052a029029b096a78955eadd68ab594ce2197e24ec50e6b6d2ab3f4e33b/orjson-3.11.5-cp312-cp312-macosx_10_15_x86_64.macosx_11_0_arm64.macosx_10_15_universal2.whl", hash = "sha256:334e5b4bff9ad101237c2d799d9fd45737752929753bf4faf4b207335a416b7d", size = 245347, upload-time = "2025-12-06T15:54:22.061Z" },
-    { url = "https://files.pythonhosted.org/packages/64/67/574a7732bd9d9d79ac620c8790b4cfe0717a3d5a6eb2b539e6e8995e24a0/orjson-3.11.5-cp312-cp312-macosx_15_0_arm64.whl", hash = "sha256:ff770589960a86eae279f5d8aa536196ebda8273a2a07db2a54e82b93bc86626", size = 129435, upload-time = "2025-12-06T15:54:23.615Z" },
-    { url = "https://files.pythonhosted.org/packages/52/8d/544e77d7a29d90cf4d9eecd0ae801c688e7f3d1adfa2ebae5e1e94d38ab9/orjson-3.11.5-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:ed24250e55efbcb0b35bed7caaec8cedf858ab2f9f2201f17b8938c618c8ca6f", size = 132074, upload-time = "2025-12-06T15:54:24.694Z" },
-    { url = "https://files.pythonhosted.org/packages/6e/57/b9f5b5b6fbff9c26f77e785baf56ae8460ef74acdb3eae4931c25b8f5ba9/orjson-3.11.5-cp312-cp312-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:a66d7769e98a08a12a139049aac2f0ca3adae989817f8c43337455fbc7669b85", size = 130520, upload-time = "2025-12-06T15:54:26.185Z" },
-    { url = "https://files.pythonhosted.org/packages/f6/6d/d34970bf9eb33f9ec7c979a262cad86076814859e54eb9a059a52f6dc13d/orjson-3.11.5-cp312-cp312-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:86cfc555bfd5794d24c6a1903e558b50644e5e68e6471d66502ce5cb5fdef3f9", size = 136209, upload-time = "2025-12-06T15:54:27.264Z" },
-    { url = "https://files.pythonhosted.org/packages/e7/39/bc373b63cc0e117a105ea12e57280f83ae52fdee426890d57412432d63b3/orjson-3.11.5-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:a230065027bc2a025e944f9d4714976a81e7ecfa940923283bca7bbc1f10f626", size = 139837, upload-time = "2025-12-06T15:54:28.75Z" },
-    { url = "https://files.pythonhosted.org/packages/cb/aa/7c4818c8d7d324da220f4f1af55c343956003aa4d1ce1857bdc1d396ba69/orjson-3.11.5-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:b29d36b60e606df01959c4b982729c8845c69d1963f88686608be9ced96dbfaa", size = 137307, upload-time = "2025-12-06T15:54:29.856Z" },
-    { url = "https://files.pythonhosted.org/packages/46/bf/0993b5a056759ba65145effe3a79dd5a939d4a070eaa5da2ee3180fbb13f/orjson-3.11.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:c74099c6b230d4261fdc3169d50efc09abf38ace1a42ea2f9994b1d79153d477", size = 139020, upload-time = "2025-12-06T15:54:31.024Z" },
-    { url = "https://files.pythonhosted.org/packages/65/e8/83a6c95db3039e504eda60fc388f9faedbb4f6472f5aba7084e06552d9aa/orjson-3.11.5-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:e697d06ad57dd0c7a737771d470eedc18e68dfdefcdd3b7de7f33dfda5b6212e", size = 141099, upload-time = "2025-12-06T15:54:32.196Z" },
-    { url = "https://files.pythonhosted.org/packages/b9/b4/24fdc024abfce31c2f6812973b0a693688037ece5dc64b7a60c1ce69e2f2/orjson-3.11.5-cp312-cp312-musllinux_1_2_armv7l.whl", hash = "sha256:e08ca8a6c851e95aaecc32bc44a5aa75d0ad26af8cdac7c77e4ed93acf3d5b69", size = 413540, upload-time = "2025-12-06T15:54:33.361Z" },
-    { url = "https://files.pythonhosted.org/packages/d9/37/01c0ec95d55ed0c11e4cae3e10427e479bba40c77312b63e1f9665e0737d/orjson-3.11.5-cp312-cp312-musllinux_1_2_i686.whl", hash = "sha256:e8b5f96c05fce7d0218df3fdfeb962d6b8cfff7e3e20264306b46dd8b217c0f3", size = 151530, upload-time = "2025-12-06T15:54:34.6Z" },
-    { url = "https://files.pythonhosted.org/packages/f9/d4/f9ebc57182705bb4bbe63f5bbe14af43722a2533135e1d2fb7affa0c355d/orjson-3.11.5-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:ddbfdb5099b3e6ba6d6ea818f61997bb66de14b411357d24c4612cf1ebad08ca", size = 141863, upload-time = "2025-12-06T15:54:35.801Z" },
-    { url = "https://files.pythonhosted.org/packages/0d/04/02102b8d19fdcb009d72d622bb5781e8f3fae1646bf3e18c53d1bc8115b5/orjson-3.11.5-cp312-cp312-win32.whl", hash = "sha256:9172578c4eb09dbfcf1657d43198de59b6cef4054de385365060ed50c458ac98", size = 135255, upload-time = "2025-12-06T15:54:37.209Z" },
-    { url = "https://files.pythonhosted.org/packages/d4/fb/f05646c43d5450492cb387de5549f6de90a71001682c17882d9f66476af5/orjson-3.11.5-cp312-cp312-win_amd64.whl", hash = "sha256:2b91126e7b470ff2e75746f6f6ee32b9ab67b7a93c8ba1d15d3a0caaf16ec875", size = 133252, upload-time = "2025-12-06T15:54:38.401Z" },
-    { url = "https://files.pythonhosted.org/packages/dc/a6/7b8c0b26ba18c793533ac1cd145e131e46fcf43952aa94c109b5b913c1f0/orjson-3.11.5-cp312-cp312-win_arm64.whl", hash = "sha256:acbc5fac7e06777555b0722b8ad5f574739e99ffe99467ed63da98f97f9ca0fe", size = 126777, upload-time = "2025-12-06T15:54:39.515Z" },
 ]
 [[package]]

 [package.metadata]
 requires-dist = [
     { name = "fastapi", specifier = ">=0.128.0" },
+    { name = "gemini-webapi", specifier = ">=1.18.0" },
     { name = "lmdb", specifier = ">=1.7.5" },
     { name = "loguru", specifier = ">=0.7.3" },
+    { name = "orjson", specifier = ">=3.11.7" },
     { name = "pydantic-settings", extras = ["yaml"], specifier = ">=2.12.0" },
     { name = "ruff", marker = "extra == 'dev'", specifier = ">=0.14.14" },
     { name = "uvicorn", specifier = ">=0.40.0" },
 [[package]]
 name = "gemini-webapi"
+version = "1.18.0"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
+    { name = "httpx", extra = ["http2"] },
     { name = "loguru" },
     { name = "orjson" },
     { name = "pydantic" },
 ]
+sdist = { url = "https://files.pythonhosted.org/packages/c6/03/eb06536f287a8b7fb4808b00a60d9a9a3694f8a4079b77730325c639fbbe/gemini_webapi-1.18.0.tar.gz", hash = "sha256:0688a080fc3c95be55e723a66b2b69ec3ffcd58b07c50cf627d85d59d1181a86", size = 264630, upload-time = "2026-02-03T01:18:39.794Z" }
 wheels = [
+    { url = "https://files.pythonhosted.org/packages/40/33/85f520f56faddd68442c7efe7086ff5593b213bd8fc3768835dbe610fd9b/gemini_webapi-1.18.0-py3-none-any.whl", hash = "sha256:2fe25b5f8185aba1ca109e1280ef3eb79e5bd8a81fba16e01fbc4a177b72362c", size = 61523, upload-time = "2026-02-03T01:18:38.322Z" },
 ]
 [[package]]
     { url = "https://files.pythonhosted.org/packages/04/4b/29cac41a4d98d144bf5f6d33995617b185d14b22401f75ca86f384e87ff1/h11-0.16.0-py3-none-any.whl", hash = "sha256:63cf8bbe7522de3bf65932fda1d9c2772064ffb3dae62d55932da54b31cb6c86", size = 37515, upload-time = "2025-04-24T03:35:24.344Z" },
 ]
+[[package]]
+name = "h2"
+version = "4.3.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "hpack" },
+    { name = "hyperframe" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/1d/17/afa56379f94ad0fe8defd37d6eb3f89a25404ffc71d4d848893d270325fc/h2-4.3.0.tar.gz", hash = "sha256:6c59efe4323fa18b47a632221a1888bd7fde6249819beda254aeca909f221bf1", size = 2152026, upload-time = "2025-08-23T18:12:19.778Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/69/b2/119f6e6dcbd96f9069ce9a2665e0146588dc9f88f29549711853645e736a/h2-4.3.0-py3-none-any.whl", hash = "sha256:c438f029a25f7945c69e0ccf0fb951dc3f73a5f6412981daee861431b70e2bdd", size = 61779, upload-time = "2025-08-23T18:12:17.779Z" },
+]
+[[package]]
+name = "hpack"
+version = "4.1.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/2c/48/71de9ed269fdae9c8057e5a4c0aa7402e8bb16f2c6e90b3aa53327b113f8/hpack-4.1.0.tar.gz", hash = "sha256:ec5eca154f7056aa06f196a557655c5b009b382873ac8d1e66e79e87535f1dca", size = 51276, upload-time = "2025-01-22T21:44:58.347Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/07/c6/80c95b1b2b94682a72cbdbfb85b81ae2daffa4291fbfa1b1464502ede10d/hpack-4.1.0-py3-none-any.whl", hash = "sha256:157ac792668d995c657d93111f46b4535ed114f0c9c8d672271bbec7eae1b496", size = 34357, upload-time = "2025-01-22T21:44:56.92Z" },
+]
 [[package]]
 name = "httpcore"
 version = "1.0.9"
     { url = "https://files.pythonhosted.org/packages/2a/39/e50c7c3a983047577ee07d2a9e53faf5a69493943ec3f6a384bdc792deb2/httpx-0.28.1-py3-none-any.whl", hash = "sha256:d909fcccc110f8c7faf814ca82a9a4d816bc5a6dbfea25d6591d6985b8ba59ad", size = 73517, upload-time = "2024-12-06T15:37:21.509Z" },
 ]
+[package.optional-dependencies]
+http2 = [
+    { name = "h2" },
+]
+[[package]]
+name = "hyperframe"
+version = "6.1.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/02/e7/94f8232d4a74cc99514c13a9f995811485a6903d48e5d952771ef6322e30/hyperframe-6.1.0.tar.gz", hash = "sha256:f630908a00854a7adeabd6382b43923a4c4cd4b821fcb527e6ab9e15382a3b08", size = 26566, upload-time = "2025-01-22T21:41:49.302Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/48/30/47d0bf6072f7252e6521f3447ccfa40b421b6824517f82854703d0f5a98b/hyperframe-6.1.0-py3-none-any.whl", hash = "sha256:b03380493a519fce58ea5af42e4a42317bf9bd425596f7a0835ffce80f1a42e5", size = 13007, upload-time = "2025-01-22T21:41:47.295Z" },
+]
 [[package]]
 name = "idna"
 version = "3.11"
 [[package]]
 name = "orjson"
+version = "3.11.7"
 source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/53/45/b268004f745ede84e5798b48ee12b05129d19235d0e15267aa57dcdb400b/orjson-3.11.7.tar.gz", hash = "sha256:9b1a67243945819ce55d24a30b59d6a168e86220452d2c96f4d1f093e71c0c49", size = 6144992, upload-time = "2026-02-02T15:38:49.29Z" }
 wheels = [
+    { url = "https://files.pythonhosted.org/packages/80/bf/76f4f1665f6983385938f0e2a5d7efa12a58171b8456c252f3bae8a4cf75/orjson-3.11.7-cp312-cp312-macosx_10_15_x86_64.macosx_11_0_arm64.macosx_10_15_universal2.whl", hash = "sha256:bd03ea7606833655048dab1a00734a2875e3e86c276e1d772b2a02556f0d895f", size = 228545, upload-time = "2026-02-02T15:37:46.376Z" },
+    { url = "https://files.pythonhosted.org/packages/79/53/6c72c002cb13b5a978a068add59b25a8bdf2800ac1c9c8ecdb26d6d97064/orjson-3.11.7-cp312-cp312-macosx_15_0_arm64.whl", hash = "sha256:89e440ebc74ce8ab5c7bc4ce6757b4a6b1041becb127df818f6997b5c71aa60b", size = 125224, upload-time = "2026-02-02T15:37:47.697Z" },
+    { url = "https://files.pythonhosted.org/packages/2c/83/10e48852865e5dd151bdfe652c06f7da484578ed02c5fca938e3632cb0b8/orjson-3.11.7-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:5ede977b5fe5ac91b1dffc0a517ca4542d2ec8a6a4ff7b2652d94f640796342a", size = 128154, upload-time = "2026-02-02T15:37:48.954Z" },
+    { url = "https://files.pythonhosted.org/packages/6e/52/a66e22a2b9abaa374b4a081d410edab6d1e30024707b87eab7c734afe28d/orjson-3.11.7-cp312-cp312-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:b7b1dae39230a393df353827c855a5f176271c23434cfd2db74e0e424e693e10", size = 123548, upload-time = "2026-02-02T15:37:50.187Z" },
+    { url = "https://files.pythonhosted.org/packages/de/38/605d371417021359f4910c496f764c48ceb8997605f8c25bf1dfe58c0ebe/orjson-3.11.7-cp312-cp312-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:ed46f17096e28fb28d2975834836a639af7278aa87c84f68ab08fbe5b8bd75fa", size = 129000, upload-time = "2026-02-02T15:37:51.426Z" },
+    { url = "https://files.pythonhosted.org/packages/44/98/af32e842b0ffd2335c89714d48ca4e3917b42f5d6ee5537832e069a4b3ac/orjson-3.11.7-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:3726be79e36e526e3d9c1aceaadbfb4a04ee80a72ab47b3f3c17fefb9812e7b8", size = 141686, upload-time = "2026-02-02T15:37:52.607Z" },
+    { url = "https://files.pythonhosted.org/packages/96/0b/fc793858dfa54be6feee940c1463370ece34b3c39c1ca0aa3845f5ba9892/orjson-3.11.7-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:0724e265bc548af1dedebd9cb3d24b4e1c1e685a343be43e87ba922a5c5fff2f", size = 130812, upload-time = "2026-02-02T15:37:53.944Z" },
+    { url = "https://files.pythonhosted.org/packages/dc/91/98a52415059db3f374757d0b7f0f16e3b5cd5976c90d1c2b56acaea039e6/orjson-3.11.7-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:e7745312efa9e11c17fbd3cb3097262d079da26930ae9ae7ba28fb738367cbad", size = 133440, upload-time = "2026-02-02T15:37:55.615Z" },
+    { url = "https://files.pythonhosted.org/packages/dc/b6/cb540117bda61791f46381f8c26c8f93e802892830a6055748d3bb1925ab/orjson-3.11.7-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:f904c24bdeabd4298f7a977ef14ca2a022ca921ed670b92ecd16ab6f3d01f867", size = 138386, upload-time = "2026-02-02T15:37:56.814Z" },
+    { url = "https://files.pythonhosted.org/packages/63/1a/50a3201c334a7f17c231eee5f841342190723794e3b06293f26e7cf87d31/orjson-3.11.7-cp312-cp312-musllinux_1_2_armv7l.whl", hash = "sha256:b9fc4d0f81f394689e0814617aadc4f2ea0e8025f38c226cbf22d3b5ddbf025d", size = 408853, upload-time = "2026-02-02T15:37:58.291Z" },
+    { url = "https://files.pythonhosted.org/packages/87/cd/8de1c67d0be44fdc22701e5989c0d015a2adf391498ad42c4dc589cd3013/orjson-3.11.7-cp312-cp312-musllinux_1_2_i686.whl", hash = "sha256:849e38203e5be40b776ed2718e587faf204d184fc9a008ae441f9442320c0cab", size = 144130, upload-time = "2026-02-02T15:38:00.163Z" },
+    { url = "https://files.pythonhosted.org/packages/0f/fe/d605d700c35dd55f51710d159fc54516a280923cd1b7e47508982fbb387d/orjson-3.11.7-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:4682d1db3bcebd2b64757e0ddf9e87ae5f00d29d16c5cdf3a62f561d08cc3dd2", size = 134818, upload-time = "2026-02-02T15:38:01.507Z" },
+    { url = "https://files.pythonhosted.org/packages/e4/e4/15ecc67edb3ddb3e2f46ae04475f2d294e8b60c1825fbe28a428b93b3fbd/orjson-3.11.7-cp312-cp312-win32.whl", hash = "sha256:f4f7c956b5215d949a1f65334cf9d7612dde38f20a95f2315deef167def91a6f", size = 127923, upload-time = "2026-02-02T15:38:02.75Z" },
+    { url = "https://files.pythonhosted.org/packages/34/70/2e0855361f76198a3965273048c8e50a9695d88cd75811a5b46444895845/orjson-3.11.7-cp312-cp312-win_amd64.whl", hash = "sha256:bf742e149121dc5648ba0a08ea0871e87b660467ef168a3a5e53bc1fbd64bb74", size = 125007, upload-time = "2026-02-02T15:38:04.032Z" },
+    { url = "https://files.pythonhosted.org/packages/68/40/c2051bd19fc467610fed469dc29e43ac65891571138f476834ca192bc290/orjson-3.11.7-cp312-cp312-win_arm64.whl", hash = "sha256:26c3b9132f783b7d7903bf1efb095fed8d4a3a85ec0d334ee8beff3d7a4749d5", size = 126089, upload-time = "2026-02-02T15:38:05.297Z" },
 ]
 [[package]]