Spaces:

superxu520
/

G_AI

Paused

Lưu Quang Vũ commited on Jan 22

Commit

a583ded

unverified ·

1 Parent(s): 79b9136

feat: Add support for custom Gemini models and model loading strategies (#86)

* feat: Add support for custom Gemini models and model loading strategies

- Introduced `model_strategy` configuration for "append" (default + custom models) or "overwrite" (custom models only).
- Enhanced `/v1/models` endpoint to return models based on the configured strategy.
- Improved model loading with environment variable overrides and validation.
- Refactored model handling logic for improved modularity and error handling.

* feat: Improve Gemini model environment variable parsing and nested field support

- Enhanced `extract_gemini_models_env` to handle nested fields within environment variables.
- Updated type hints for more flexibility in model overrides.
- Improved `_merge_models_with_env` to better support field-level updates and appending new models.

* refactor: Consolidate utility functions and clean up unused code

- Moved utility functions like `strip_code_fence`, `extract_tool_calls`, and `iter_stream_segments` to a centralized helper module.
- Removed unused and redundant private methods from `chat.py`, including `_strip_code_fence`, `_strip_tagged_blocks`, and `_strip_system_hints`.
- Updated imports and references across modules for consistency.
- Simplified tool call and streaming logic by replacing inline implementations with shared helper functions.

* fix: Handle None input in `estimate_tokens` and return 0 for empty text

* refactor: Simplify model configuration and add JSON parsing validators

- Replaced unused model placeholder in `config.yaml` with an empty list.
- Added JSON parsing validators for `model_header` and `models` to enhance flexibility and error handling.
- Improved validation to filter out incomplete model configurations.

* refactor: Simplify Gemini model environment variable parsing with JSON support

- Replaced prefix-based parsing with a root key approach.
- Added JSON parsing to handle list-based model configurations.
- Improved handling of errors and cleanup of environment variables.

* fix: Enhance Gemini model environment variable parsing with fallback to Python literals

- Added `ast.literal_eval` as a fallback for parsing environment variables when JSON decoding fails.
- Improved error handling and logging for invalid configurations.
- Ensured proper cleanup of environment variables post-parsing.

* fix: Improve regex patterns in helper module

- Adjusted `TOOL_CALL_RE` regex pattern for better accuracy.

* docs: Update README files to include custom model configuration and environment variable setup

* fix: Remove unused headers from HTTP client in helper module

* fix: Update README and README.zh to clarify model configuration via environment variables; enhance error logging in config validation

* Update README and README.zh to clarify model configuration via JSON string or list structure for enhanced flexibility in automated environments

Files changed (7) hide show

README.md +25 -1
README.zh.md +27 -3
app/server/chat.py +84 -288
app/services/client.py +5 -11
app/utils/config.py +129 -3
app/utils/helper.py +329 -6
config/config.yaml +2 -0

README.md CHANGED Viewed

@@ -118,7 +118,7 @@ services:
       - CONFIG_GEMINI__CLIENTS__0__SECURE_1PSID=${SECURE_1PSID}
       - CONFIG_GEMINI__CLIENTS__0__SECURE_1PSIDTS=${SECURE_1PSIDTS}
       - GEMINI_COOKIE_PATH=/app/cache # must match the cache volume mount above
-    restart: on-failure:3             # Avoid retrying too many times
 ```
 Then run:
@@ -187,6 +187,30 @@ To use Gemini-FastAPI, you need to extract your Gemini session cookies:
 Each client entry can be configured with a different proxy to work around rate limits. Omit the `proxy` field or set it to `null` or an empty string to keep a direct connection.
 ## Acknowledgments
 - [HanaokaYuzu/Gemini-API](https://github.com/HanaokaYuzu/Gemini-API) - The underlying Gemini web API client

       - CONFIG_GEMINI__CLIENTS__0__SECURE_1PSID=${SECURE_1PSID}
       - CONFIG_GEMINI__CLIENTS__0__SECURE_1PSIDTS=${SECURE_1PSIDTS}
       - GEMINI_COOKIE_PATH=/app/cache # must match the cache volume mount above
+    restart: on-failure:3 # Avoid retrying too many times
 ```
 Then run:
 Each client entry can be configured with a different proxy to work around rate limits. Omit the `proxy` field or set it to `null` or an empty string to keep a direct connection.
+### Custom Models
+You can define custom models in `config/config.yaml` or via environment variables.
+#### YAML Configuration
+```yaml
+gemini:
+  model_strategy: "append" # "append" (default + custom) or "overwrite" (custom only)
+  models:
+    - model_name: "gemini-3.0-pro"
+      model_header:
+        x-goog-ext-525001261-jspb: '[1,null,null,null,"9d8ca3786ebdfbea",null,null,0,[4],null,null,1]'
+```
+#### Environment Variables
+You can supply models as a JSON string or list structure via `CONFIG_GEMINI__MODELS`. This provides a flexible way to override settings via the shell or in automated environments (e.g. Docker) without modifying the configuration file.
+```bash
+export CONFIG_GEMINI__MODEL_STRATEGY="overwrite"
+export CONFIG_GEMINI__MODELS='[{"model_name": "gemini-3.0-pro", "model_header": {"x-goog-ext-525001261-jspb": "[1,null,null,null,\"9d8ca3786ebdfbea\",null,null,0,[4],null,null,1]"}}]'
+```
 ## Acknowledgments
 - [HanaokaYuzu/Gemini-API](https://github.com/HanaokaYuzu/Gemini-API) - The underlying Gemini web API client

README.zh.md CHANGED Viewed

@@ -4,7 +4,6 @@
 [![FastAPI](https://img.shields.io/badge/FastAPI-0.115+-green.svg)](https://fastapi.tiangolo.com/)
 [![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
 [ [English](README.md) | 中文 ]
 将 Gemini 网页端模型封装为兼容 OpenAI API 的 API Server。基于 [HanaokaYuzu/Gemini-API](https://github.com/HanaokaYuzu/Gemini-API) 实现。
@@ -50,6 +49,7 @@ pip install -e .
 ### 配置
 编辑 `config/config.yaml` 并提供至少一组凭证：
 ```yaml
 gemini:
   clients:
@@ -118,7 +118,7 @@ services:
       - CONFIG_GEMINI__CLIENTS__0__SECURE_1PSID=${SECURE_1PSID}
       - CONFIG_GEMINI__CLIENTS__0__SECURE_1PSIDTS=${SECURE_1PSIDTS}
       - GEMINI_COOKIE_PATH=/app/cache # must match the cache volume mount above
-    restart: on-failure:3             # Avoid retrying too many times
 ```
 然后运行：
@@ -186,6 +186,30 @@ export CONFIG_STORAGE__MAX_SIZE=268435456  # 256 MB
 每个客户端条目可以配置不同的代理，从而规避速率限制。省略 `proxy` 字段或将其设置为 `null` 或空字符串以保持直连。
 ## 鸣谢
 - [HanaokaYuzu/Gemini-API](https://github.com/HanaokaYuzu/Gemini-API) - 底层 Gemini Web API 客户端
@@ -193,4 +217,4 @@ export CONFIG_STORAGE__MAX_SIZE=268435456  # 256 MB
 ## 免责声明
-本项目与 Google 或 OpenAI 无关，仅供学习和研究使用。本项目使用了逆向工程 API，可能不符合 Google 服务条款。使用风险自负。

 [![FastAPI](https://img.shields.io/badge/FastAPI-0.115+-green.svg)](https://fastapi.tiangolo.com/)
 [![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
 [ [English](README.md) | 中文 ]
 将 Gemini 网页端模型封装为兼容 OpenAI API 的 API Server。基于 [HanaokaYuzu/Gemini-API](https://github.com/HanaokaYuzu/Gemini-API) 实现。
 ### 配置
 编辑 `config/config.yaml` 并提供至少一组凭证：
 ```yaml
 gemini:
   clients:
       - CONFIG_GEMINI__CLIENTS__0__SECURE_1PSID=${SECURE_1PSID}
       - CONFIG_GEMINI__CLIENTS__0__SECURE_1PSIDTS=${SECURE_1PSIDTS}
       - GEMINI_COOKIE_PATH=/app/cache # must match the cache volume mount above
+    restart: on-failure:3 # Avoid retrying too many times
 ```
 然后运行：
 每个客户端条目可以配置不同的代理，从而规避速率限制。省略 `proxy` 字段或将其设置为 `null` 或空字符串以保持直连。
+### 自定义模型
+你可以在 `config/config.yaml` 中或通过环境变量定义自定义模型。
+#### YAML 配置
+```yaml
+gemini:
+  model_strategy: "append" # "append" (默认 + 自定义) 或 "overwrite" (仅限自定义)
+  models:
+    - model_name: "gemini-3.0-pro"
+      model_header:
+        x-goog-ext-525001261-jspb: '[1,null,null,null,"9d8ca3786ebdfbea",null,null,0,[4],null,null,1]'
+```
+#### 环境变量
+你可以通过 `CONFIG_GEMINI__MODELS` 以 JSON 字符串或列表结构的形式提供模型。这为通过 shell 或在自动化环境（例如 Docker）中覆盖设置提供了一种灵活的方式，而无需修改配置文件。
+```bash
+export CONFIG_GEMINI__MODEL_STRATEGY="overwrite"
+export CONFIG_GEMINI__MODELS='[{"model_name": "gemini-3.0-pro", "model_header": {"x-goog-ext-525001261-jspb": "[1,null,null,null,\"9d8ca3786ebdfbea\",null,null,0,[4],null,null,1]"}}]'
+```
 ## 鸣谢
 - [HanaokaYuzu/Gemini-API](https://github.com/HanaokaYuzu/Gemini-API) - 底层 Gemini Web API 客户端
 ## 免责声明
+本项目与 Google 或 OpenAI 无关，仅供学习和研究使用。本项目使用了逆向工程 API，可能不符合 Google 服务条款。使用风险自负。

app/server/chat.py CHANGED Viewed

@@ -1,12 +1,11 @@
 import base64
 import json
 import re
-import struct
 import uuid
 from dataclasses import dataclass
 from datetime import datetime, timezone
 from pathlib import Path
-from typing import Any, Iterator
 import orjson
 from fastapi import APIRouter, Depends, HTTPException, Request, status
@@ -21,7 +20,6 @@ from ..models import (
     ChatCompletionRequest,
     ContentItem,
     ConversationInStore,
-    FunctionCall,
     Message,
     ModelData,
     ModelListResponse,
@@ -37,26 +35,28 @@ from ..models import (
     ResponseToolChoice,
     ResponseUsage,
     Tool,
-    ToolCall,
     ToolChoiceFunction,
 )
 from ..services import GeminiClientPool, GeminiClientWrapper, LMDBConversationStore
-from ..services.client import CODE_BLOCK_HINT, XML_WRAP_HINT
 from ..utils import g_config
-from ..utils.helper import estimate_tokens
 from .middleware import get_image_store_dir, get_image_token, get_temp_dir, verify_api_key
 # Maximum characters Gemini Web can accept in a single request (configurable)
 MAX_CHARS_PER_REQUEST = int(g_config.gemini.max_chars_per_request * 0.9)
 CONTINUATION_HINT = "\n(More messages to come, please reply with just 'ok.')"
-TOOL_BLOCK_RE = re.compile(r"```xml\s*(.*?)```", re.DOTALL | re.IGNORECASE)
-TOOL_CALL_RE = re.compile(
-    r"<tool_call\s+name=\"([^\"]+)\">(.*?)</tool_call>", re.DOTALL | re.IGNORECASE
-)
-JSON_FENCE_RE = re.compile(r"^```(?:json)?\s*(.*?)\s*```$", re.DOTALL | re.IGNORECASE)
-CONTROL_TOKEN_RE = re.compile(r"<\|im_(?:start|end)\|>")
-XML_HINT_STRIPPED = XML_WRAP_HINT.strip()
-CODE_HINT_STRIPPED = CODE_BLOCK_HINT.strip()
 router = APIRouter()
@@ -118,14 +118,6 @@ def _build_structured_requirement(
     )
-def _strip_code_fence(text: str) -> str:
-    """Remove surrounding ```json fences if present."""
-    match = JSON_FENCE_RE.match(text.strip())
-    if match:
-        return match.group(1).strip()
-    return text.strip()
 def _build_tool_prompt(
     tools: list[Tool],
     tool_choice: str | ToolChoiceFunction | None,
@@ -312,75 +304,6 @@ def _prepare_messages_for_model(
     return prepared
-def _strip_system_hints(text: str) -> str:
-    """Remove system-level hint text from a given string."""
-    if not text:
-        return text
-    cleaned = _strip_tagged_blocks(text)
-    cleaned = cleaned.replace(XML_WRAP_HINT, "").replace(XML_HINT_STRIPPED, "")
-    cleaned = cleaned.replace(CODE_BLOCK_HINT, "").replace(CODE_HINT_STRIPPED, "")
-    cleaned = CONTROL_TOKEN_RE.sub("", cleaned)
-    return cleaned.strip()
-def _strip_tagged_blocks(text: str) -> str:
-    """Remove <|im_start|>role ... <|im_end|> sections, dropping tool blocks entirely.
-    - tool blocks are removed entirely (if missing end marker, drop to EOF).
-    - other roles: remove markers and role, keep inner content (if missing end marker, keep to EOF).
-    """
-    if not text:
-        return text
-    result: list[str] = []
-    idx = 0
-    length = len(text)
-    start_marker = "<|im_start|>"
-    end_marker = "<|im_end|>"
-    while idx < length:
-        start = text.find(start_marker, idx)
-        if start == -1:
-            result.append(text[idx:])
-            break
-        # append any content before this block
-        result.append(text[idx:start])
-        role_start = start + len(start_marker)
-        newline = text.find("\n", role_start)
-        if newline == -1:
-            # malformed block; keep remainder as-is (safe behavior)
-            result.append(text[start:])
-            break
-        role = text[role_start:newline].strip().lower()
-        end = text.find(end_marker, newline + 1)
-        if end == -1:
-            # missing end marker
-            if role == "tool":
-                # drop from start marker to EOF (skip remainder)
-                break
-            else:
-                # keep inner content from after the role newline to EOF
-                result.append(text[newline + 1 :])
-                break
-        block_end = end + len(end_marker)
-        if role == "tool":
-            # drop whole block
-            idx = block_end
-            continue
-        # keep the content without role markers
-        content = text[newline + 1 : end]
-        result.append(content)
-        idx = block_end
-    return "".join(result)
 def _response_items_to_messages(
     items: str | list[ResponseInputItem],
 ) -> tuple[list[Message], str | list[ResponseInputItem]]:
@@ -509,77 +432,64 @@ def _instructions_to_messages(
     return instruction_messages
-def _remove_tool_call_blocks(text: str) -> str:
-    """Strip tool call code blocks from text."""
-    if not text:
-        return text
-    cleaned = TOOL_BLOCK_RE.sub("", text)
-    return _strip_system_hints(cleaned)
-def _extract_tool_calls(text: str) -> tuple[str, list[ToolCall]]:
-    """Extract tool call definitions and return cleaned text."""
-    if not text:
-        return text, []
-    tool_calls: list[ToolCall] = []
-    def _replace(match: re.Match[str]) -> str:
-        block_content = match.group(1)
-        if not block_content:
-            return ""
-        for call_match in TOOL_CALL_RE.finditer(block_content):
-            name = (call_match.group(1) or "").strip()
-            raw_args = (call_match.group(2) or "").strip()
-            if not name:
-                logger.warning(
-                    f"Encountered tool_call block without a function name: {block_content}"
-                )
-                continue
-            arguments = raw_args
-            try:
-                parsed_args = json.loads(raw_args)
-                arguments = json.dumps(parsed_args, ensure_ascii=False)
-            except json.JSONDecodeError:
-                logger.warning(
-                    f"Failed to parse tool call arguments for '{name}'. Passing raw string."
-                )
-            tool_calls.append(
-                ToolCall(
-                    id=f"call_{uuid.uuid4().hex}",
-                    type="function",
-                    function=FunctionCall(name=name, arguments=arguments),
                 )
             )
-        return ""
-    cleaned = TOOL_BLOCK_RE.sub(_replace, text)
-    cleaned = _strip_system_hints(cleaned)
-    return cleaned, tool_calls
 @router.get("/v1/models", response_model=ModelListResponse)
 async def list_models(api_key: str = Depends(verify_api_key)):
-    now = int(datetime.now(tz=timezone.utc).timestamp())
-    models = []
-    for model in Model:
-        m_name = model.model_name
-        if not m_name or m_name == "unspecified":
-            continue
-        models.append(
-            ModelData(
-                id=m_name,
-                created=now,
-                owned_by="gemini-web",
-            )
-        )
     return ModelListResponse(data=models)
@@ -592,7 +502,11 @@ async def create_chat_completion(
 ):
     pool = GeminiClientPool()
     db = LMDBConversationStore()
-    model = Model.from_name(request.model)
     if len(request.messages) == 0:
         raise HTTPException(
@@ -698,12 +612,12 @@ async def create_chat_completion(
             detail="Gemini output parsing failed unexpectedly.",
         ) from exc
-    visible_output, tool_calls = _extract_tool_calls(raw_output_with_think)
-    storage_output = _remove_tool_call_blocks(raw_output_clean).strip()
     tool_calls_payload = [call.model_dump(mode="json") for call in tool_calls]
     if structured_requirement:
-        cleaned_visible = _strip_code_fence(visible_output or "")
         if not cleaned_visible:
             raise HTTPException(
                 status_code=status.HTTP_502_BAD_GATEWAY,
@@ -849,7 +763,7 @@ async def create_response(
     db = LMDBConversationStore()
     try:
-        model = Model.from_name(request_data.model)
     except ValueError as exc:
         raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=str(exc)) from exc
@@ -938,12 +852,12 @@ async def create_response(
             detail="Gemini output parsing failed unexpectedly.",
         ) from exc
-    visible_text, detected_tool_calls = _extract_tool_calls(text_with_think)
-    storage_output = _remove_tool_call_blocks(text_without_think).strip()
     assistant_text = LMDBConversationStore.remove_think_tags(visible_text.strip())
     if structured_requirement:
-        cleaned_visible = _strip_code_fence(assistant_text or "")
         if not cleaned_visible:
             raise HTTPException(
                 status_code=status.HTTP_502_BAD_GATEWAY,
@@ -1010,7 +924,7 @@ async def create_response(
         image_call_items.append(
             ResponseImageGenerationCall(
-                id=f"img_{uuid.uuid4().hex}",
                 status="completed",
                 result=image_base64,
                 output_format=img_format,
@@ -1045,7 +959,7 @@ async def create_response(
     response_id = f"resp_{uuid.uuid4().hex}"
     message_id = f"msg_{uuid.uuid4().hex}"
-    input_tokens = sum(estimate_tokens(_text_from_message(msg)) for msg in messages)
     tool_arg_text = "".join(call.function.arguments or "" for call in detected_tool_calls)
     completion_basis = assistant_text or ""
     if tool_arg_text:
@@ -1108,25 +1022,6 @@ async def create_response(
     return response_payload
-def _text_from_message(message: Message) -> str:
-    """Return text content from a message for token estimation."""
-    base_text = ""
-    if isinstance(message.content, str):
-        base_text = message.content
-    elif isinstance(message.content, list):
-        base_text = "\n".join(
-            item.text or "" for item in message.content if getattr(item, "type", "") == "text"
-        )
-    elif message.content is None:
-        base_text = ""
-    if message.tool_calls:
-        tool_arg_text = "".join(call.function.arguments or "" for call in message.tool_calls)
-        base_text = f"{base_text}\n{tool_arg_text}" if base_text else tool_arg_text
-    return base_text
 async def _find_reusable_session(
     db: LMDBConversationStore,
     pool: GeminiClientPool,
@@ -1224,47 +1119,6 @@ async def _send_with_split(session: ChatSession, text: str, files: list[Path | s
         raise
-def _iter_stream_segments(model_output: str, chunk_size: int = 64):
-    """Yield stream segments while keeping <think> markers and words intact."""
-    if not model_output:
-        return
-    token_pattern = re.compile(r"\s+|\S+\s*")
-    pending = ""
-    def _flush_pending() -> Iterator[str]:
-        nonlocal pending
-        if pending:
-            yield pending
-            pending = ""
-    # Split on <think> boundaries so the markers are never fragmented.
-    parts = re.split(r"(</?think>)", model_output)
-    for part in parts:
-        if not part:
-            continue
-        if part in {"<think>", "</think>"}:
-            yield from _flush_pending()
-            yield part
-            continue
-        for match in token_pattern.finditer(part):
-            token = match.group(0)
-            if len(token) > chunk_size:
-                yield from _flush_pending()
-                for idx in range(0, len(token), chunk_size):
-                    yield token[idx : idx + chunk_size]
-                continue
-            if pending and len(pending) + len(token) > chunk_size:
-                yield from _flush_pending()
-            pending += token
-    yield from _flush_pending()
 def _create_streaming_response(
     model_output: str,
     tool_calls: list[dict],
@@ -1276,7 +1130,7 @@ def _create_streaming_response(
     """Create streaming response with `usage` calculation included in the final chunk."""
     # Calculate token usage
-    prompt_tokens = sum(estimate_tokens(_text_from_message(msg)) for msg in messages)
     tool_args = "".join(call.get("function", {}).get("arguments", "") for call in tool_calls or [])
     completion_tokens = estimate_tokens(model_output + tool_args)
     total_tokens = prompt_tokens + completion_tokens
@@ -1294,7 +1148,7 @@ def _create_streaming_response(
         yield f"data: {orjson.dumps(data).decode('utf-8')}\n\n"
         # Stream output text in chunks for efficiency
-        for chunk in _iter_stream_segments(model_output):
             data = {
                 "id": completion_id,
                 "object": "chat.completion.chunk",
@@ -1408,7 +1262,7 @@ def _create_responses_streaming_response(
                         content_text += c.text
                 if content_text:
-                    for chunk in _iter_stream_segments(content_text):
                         delta_event = {
                             **base_event,
                             "type": "response.output_text.delta",
@@ -1457,7 +1311,7 @@ def _create_standard_response(
 ) -> dict:
     """Create standard response"""
     # Calculate token usage
-    prompt_tokens = sum(estimate_tokens(_text_from_message(msg)) for msg in messages)
     tool_args = "".join(call.get("function", {}).get("arguments", "") for call in tool_calls or [])
     completion_tokens = estimate_tokens(model_output + tool_args)
     total_tokens = prompt_tokens + completion_tokens
@@ -1490,74 +1344,16 @@ def _create_standard_response(
     return result
-def _extract_image_dimensions(data: bytes) -> tuple[int | None, int | None]:
-    """Return image dimensions (width, height) if PNG or JPEG headers are present."""
-    # PNG: dimensions stored in bytes 16..24 of the IHDR chunk
-    if len(data) >= 24 and data.startswith(b"\x89PNG\r\n\x1a\n"):
-        try:
-            width, height = struct.unpack(">II", data[16:24])
-            return int(width), int(height)
-        except struct.error:
-            return None, None
-    # JPEG: dimensions stored in SOF segment; iterate through markers to locate it
-    if len(data) >= 4 and data[0:2] == b"\xff\xd8":
-        idx = 2
-        length = len(data)
-        sof_markers = {
-            0xC0,
-            0xC1,
-            0xC2,
-            0xC3,
-            0xC5,
-            0xC6,
-            0xC7,
-            0xC9,
-            0xCA,
-            0xCB,
-            0xCD,
-            0xCE,
-            0xCF,
-        }
-        while idx < length:
-            # Find marker alignment (markers are prefixed with 0xFF bytes)
-            if data[idx] != 0xFF:
-                idx += 1
-                continue
-            while idx < length and data[idx] == 0xFF:
-                idx += 1
-            if idx >= length:
-                break
-            marker = data[idx]
-            idx += 1
-            if marker in (0xD8, 0xD9, 0x01) or 0xD0 <= marker <= 0xD7:
-                continue
-            if idx + 1 >= length:
-                break
-            segment_length = (data[idx] << 8) + data[idx + 1]
-            idx += 2
-            if segment_length < 2:
-                break
-            if marker in sof_markers:
-                if idx + 4 < length:
-                    # Skip precision byte at idx, then read height/width (big-endian)
-                    height = (data[idx + 1] << 8) + data[idx + 2]
-                    width = (data[idx + 3] << 8) + data[idx + 4]
-                    return int(width), int(height)
-                break
-            idx += segment_length - 2
-    return None, None
 async def _image_to_base64(image: Image, temp_dir: Path) -> tuple[str, int | None, int | None, str]:
     """Persist an image provided by gemini_webapi and return base64 plus dimensions and filename."""
     if isinstance(image, GeneratedImage):
-        saved_path = await image.save(path=str(temp_dir), full_size=True)
     else:
         saved_path = await image.save(path=str(temp_dir))
@@ -1571,6 +1367,6 @@ async def _image_to_base64(image: Image, temp_dir: Path) -> tuple[str, int | Non
     original_path.rename(new_path)
     data = new_path.read_bytes()
-    width, height = _extract_image_dimensions(data)
     filename = random_name
     return base64.b64encode(data).decode("ascii"), width, height, filename

 import base64
 import json
 import re
 import uuid
 from dataclasses import dataclass
 from datetime import datetime, timezone
 from pathlib import Path
+from typing import Any
 import orjson
 from fastapi import APIRouter, Depends, HTTPException, Request, status
     ChatCompletionRequest,
     ContentItem,
     ConversationInStore,
     Message,
     ModelData,
     ModelListResponse,
     ResponseToolChoice,
     ResponseUsage,
     Tool,
     ToolChoiceFunction,
 )
 from ..services import GeminiClientPool, GeminiClientWrapper, LMDBConversationStore
 from ..utils import g_config
+from ..utils.helper import (
+    CODE_BLOCK_HINT,
+    CODE_HINT_STRIPPED,
+    XML_HINT_STRIPPED,
+    XML_WRAP_HINT,
+    estimate_tokens,
+    extract_image_dimensions,
+    extract_tool_calls,
+    iter_stream_segments,
+    remove_tool_call_blocks,
+    strip_code_fence,
+    text_from_message,
+)
 from .middleware import get_image_store_dir, get_image_token, get_temp_dir, verify_api_key
 # Maximum characters Gemini Web can accept in a single request (configurable)
 MAX_CHARS_PER_REQUEST = int(g_config.gemini.max_chars_per_request * 0.9)
 CONTINUATION_HINT = "\n(More messages to come, please reply with just 'ok.')"
 router = APIRouter()
     )
 def _build_tool_prompt(
     tools: list[Tool],
     tool_choice: str | ToolChoiceFunction | None,
     return prepared
 def _response_items_to_messages(
     items: str | list[ResponseInputItem],
 ) -> tuple[list[Message], str | list[ResponseInputItem]]:
     return instruction_messages
+def _get_model_by_name(name: str) -> Model:
+    """
+    Retrieve a Model instance by name, considering custom models from config
+    and the update strategy (append or overwrite).
+    """
+    strategy = g_config.gemini.model_strategy
+    custom_models = {m.model_name: m for m in g_config.gemini.models if m.model_name}
+    if name in custom_models:
+        return Model.from_dict(custom_models[name].model_dump())
+    if strategy == "overwrite":
+        raise ValueError(f"Model '{name}' not found in custom models (strategy='overwrite').")
+    return Model.from_name(name)
+def _get_available_models() -> list[ModelData]:
+    """
+    Return a list of available models based on configuration strategy.
+    """
+    now = int(datetime.now(tz=timezone.utc).timestamp())
+    strategy = g_config.gemini.model_strategy
+    models_data = []
+    custom_models = [m for m in g_config.gemini.models if m.model_name]
+    for m in custom_models:
+        models_data.append(
+            ModelData(
+                id=m.model_name,
+                created=now,
+                owned_by="custom",
+            )
+        )
+    if strategy == "append":
+        custom_ids = {m.model_name for m in custom_models}
+        for model in Model:
+            m_name = model.model_name
+            if not m_name or m_name == "unspecified":
+                continue
+            if m_name in custom_ids:
+                continue
+            models_data.append(
+                ModelData(
+                    id=m_name,
+                    created=now,
+                    owned_by="gemini-web",
                 )
             )
+    return models_data
 @router.get("/v1/models", response_model=ModelListResponse)
 async def list_models(api_key: str = Depends(verify_api_key)):
+    models = _get_available_models()
     return ModelListResponse(data=models)
 ):
     pool = GeminiClientPool()
     db = LMDBConversationStore()
+    try:
+        model = _get_model_by_name(request.model)
+    except ValueError as exc:
+        raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=str(exc)) from exc
     if len(request.messages) == 0:
         raise HTTPException(
             detail="Gemini output parsing failed unexpectedly.",
         ) from exc
+    visible_output, tool_calls = extract_tool_calls(raw_output_with_think)
+    storage_output = remove_tool_call_blocks(raw_output_clean).strip()
     tool_calls_payload = [call.model_dump(mode="json") for call in tool_calls]
     if structured_requirement:
+        cleaned_visible = strip_code_fence(visible_output or "")
         if not cleaned_visible:
             raise HTTPException(
                 status_code=status.HTTP_502_BAD_GATEWAY,
     db = LMDBConversationStore()
     try:
+        model = _get_model_by_name(request_data.model)
     except ValueError as exc:
         raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=str(exc)) from exc
             detail="Gemini output parsing failed unexpectedly.",
         ) from exc
+    visible_text, detected_tool_calls = extract_tool_calls(text_with_think)
+    storage_output = remove_tool_call_blocks(text_without_think).strip()
     assistant_text = LMDBConversationStore.remove_think_tags(visible_text.strip())
     if structured_requirement:
+        cleaned_visible = strip_code_fence(assistant_text or "")
         if not cleaned_visible:
             raise HTTPException(
                 status_code=status.HTTP_502_BAD_GATEWAY,
         image_call_items.append(
             ResponseImageGenerationCall(
+                id=filename.rsplit(".", 1)[0],
                 status="completed",
                 result=image_base64,
                 output_format=img_format,
     response_id = f"resp_{uuid.uuid4().hex}"
     message_id = f"msg_{uuid.uuid4().hex}"
+    input_tokens = sum(estimate_tokens(text_from_message(msg)) for msg in messages)
     tool_arg_text = "".join(call.function.arguments or "" for call in detected_tool_calls)
     completion_basis = assistant_text or ""
     if tool_arg_text:
     return response_payload
 async def _find_reusable_session(
     db: LMDBConversationStore,
     pool: GeminiClientPool,
         raise
 def _create_streaming_response(
     model_output: str,
     tool_calls: list[dict],
     """Create streaming response with `usage` calculation included in the final chunk."""
     # Calculate token usage
+    prompt_tokens = sum(estimate_tokens(text_from_message(msg)) for msg in messages)
     tool_args = "".join(call.get("function", {}).get("arguments", "") for call in tool_calls or [])
     completion_tokens = estimate_tokens(model_output + tool_args)
     total_tokens = prompt_tokens + completion_tokens
         yield f"data: {orjson.dumps(data).decode('utf-8')}\n\n"
         # Stream output text in chunks for efficiency
+        for chunk in iter_stream_segments(model_output):
             data = {
                 "id": completion_id,
                 "object": "chat.completion.chunk",
                         content_text += c.text
                 if content_text:
+                    for chunk in iter_stream_segments(content_text):
                         delta_event = {
                             **base_event,
                             "type": "response.output_text.delta",
 ) -> dict:
     """Create standard response"""
     # Calculate token usage
+    prompt_tokens = sum(estimate_tokens(text_from_message(msg)) for msg in messages)
     tool_args = "".join(call.get("function", {}).get("arguments", "") for call in tool_calls or [])
     completion_tokens = estimate_tokens(model_output + tool_args)
     total_tokens = prompt_tokens + completion_tokens
     return result
 async def _image_to_base64(image: Image, temp_dir: Path) -> tuple[str, int | None, int | None, str]:
     """Persist an image provided by gemini_webapi and return base64 plus dimensions and filename."""
     if isinstance(image, GeneratedImage):
+        try:
+            saved_path = await image.save(path=str(temp_dir), full_size=True)
+        except Exception as e:
+            logger.warning(
+                f"Failed to download full-size GeneratedImage, retrying with default size: {e}"
+            )
+            saved_path = await image.save(path=str(temp_dir), full_size=False)
     else:
         saved_path = await image.save(path=str(temp_dir))
     original_path.rename(new_path)
     data = new_path.read_bytes()
+    width, height = extract_image_dimensions(data)
     filename = random_name
     return base64.b64encode(data).decode("ascii"), width, height, filename

app/services/client.py CHANGED Viewed

@@ -9,18 +9,12 @@ from loguru import logger
 from ..models import Message
 from ..utils import g_config
-from ..utils.helper import add_tag, save_file_to_tempfile, save_url_to_tempfile
-XML_WRAP_HINT = (
-    "\nYou MUST wrap every tool call response inside a single fenced block exactly like:\n"
-    '```xml\n<tool_call name="tool_name">{"arg": "value"}</tool_call>\n```\n'
-    "Do not surround the fence with any other text or whitespace; otherwise the call will be ignored.\n"
-)
-CODE_BLOCK_HINT = (
-    "\nWhenever you include code, markup, or shell snippets, wrap each snippet in a Markdown fenced "
-    "block and supply the correct language label (for example, ```python ... ``` or ```html ... ```).\n"
-    "Fence ONLY the actual code/markup; keep all narrative or explanatory text outside the fences.\n"
 )
 HTML_ESCAPE_RE = re.compile(r"&(?:lt|gt|amp|quot|apos|#[0-9]+|#x[0-9a-fA-F]+);")
 MARKDOWN_ESCAPE_RE = re.compile(r"\\(?=[-\\`*_{}\[\]()#+.!<>])")
 CODE_FENCE_RE = re.compile(r"(```.*?```|`[^`\n]+?`)", re.DOTALL)

 from ..models import Message
 from ..utils import g_config
+from ..utils.helper import (
+    add_tag,
+    save_file_to_tempfile,
+    save_url_to_tempfile,
 )
 HTML_ESCAPE_RE = re.compile(r"&(?:lt|gt|amp|quot|apos|#[0-9]+|#x[0-9a-fA-F]+);")
 MARKDOWN_ESCAPE_RE = re.compile(r"\\(?=[-\\`*_{}\[\]()#+.!<>])")
 CODE_FENCE_RE = re.compile(r"(```.*?```|`[^`\n]+?`)", re.DOTALL)

app/utils/config.py CHANGED Viewed

@@ -1,6 +1,8 @@
 import os
 import sys
-from typing import Literal, Optional
 from loguru import logger
 from pydantic import BaseModel, Field, ValidationError, field_validator
@@ -50,12 +52,37 @@ class GeminiClientSettings(BaseModel):
         return stripped or None
 class GeminiConfig(BaseModel):
     """Gemini API configuration"""
     clients: list[GeminiClientSettings] = Field(
         ..., description="List of Gemini client credential pairs"
     )
     timeout: int = Field(default=120, ge=1, description="Init timeout")
     auto_refresh: bool = Field(True, description="Enable auto-refresh for Gemini cookies")
     refresh_interval: int = Field(
@@ -68,6 +95,36 @@ class GeminiConfig(BaseModel):
         description="Maximum characters Gemini Web can accept per request",
     )
 class CORSConfig(BaseModel):
     """CORS configuration"""
@@ -207,10 +264,74 @@ def _merge_clients_with_env(
             new_client = GeminiClientSettings(**overrides)
             result_clients.append(new_client)
         else:
-            raise IndexError(f"Client index {idx} in env is out of range.")
     return result_clients if result_clients else base_clients
 def initialize_config() -> Config:
     """
     Initialize the configuration.
@@ -221,6 +342,8 @@ def initialize_config() -> Config:
     try:
         # First, extract and remove Gemini clients related environment variables
         env_clients_overrides = extract_gemini_clients_env()
         # Then, initialize Config with pydantic_settings
         config = Config()  # type: ignore
@@ -228,7 +351,10 @@ def initialize_config() -> Config:
         # Synthesize clients
         config.gemini.clients = _merge_clients_with_env(
             config.gemini.clients, env_clients_overrides
-        )  # type: ignore
         return config
     except ValidationError as e:

+import ast
+import json
 import os
 import sys
+from typing import Any, Literal, Optional
 from loguru import logger
 from pydantic import BaseModel, Field, ValidationError, field_validator
         return stripped or None
+class GeminiModelConfig(BaseModel):
+    """Configuration for a custom Gemini model."""
+    model_name: Optional[str] = Field(default=None, description="Name of the model")
+    model_header: Optional[dict[str, Optional[str]]] = Field(
+        default=None, description="Header for the model"
+    )
+    @field_validator("model_header", mode="before")
+    @classmethod
+    def _parse_json_string(cls, v: Any) -> Any:
+        if isinstance(v, str) and v.strip().startswith("{"):
+            try:
+                return json.loads(v)
+            except json.JSONDecodeError:
+                # Return the original value to let Pydantic handle the error or type mismatch
+                return v
+        return v
 class GeminiConfig(BaseModel):
     """Gemini API configuration"""
     clients: list[GeminiClientSettings] = Field(
         ..., description="List of Gemini client credential pairs"
     )
+    models: list[GeminiModelConfig] = Field(default=[], description="List of custom Gemini models")
+    model_strategy: Literal["append", "overwrite"] = Field(
+        default="append",
+        description="Strategy for loading models: 'append' merges custom with default, 'overwrite' uses only custom",
+    )
     timeout: int = Field(default=120, ge=1, description="Init timeout")
     auto_refresh: bool = Field(True, description="Enable auto-refresh for Gemini cookies")
     refresh_interval: int = Field(
         description="Maximum characters Gemini Web can accept per request",
     )
+    @field_validator("models", mode="before")
+    @classmethod
+    def _parse_models_json(cls, v: Any) -> Any:
+        if isinstance(v, str) and v.strip().startswith("["):
+            try:
+                return json.loads(v)
+            except json.JSONDecodeError as e:
+                logger.warning(f"Failed to parse models JSON string: {e}")
+                return v
+        return v
+    @field_validator("models")
+    @classmethod
+    def _filter_valid_models(cls, v: list[GeminiModelConfig]) -> list[GeminiModelConfig]:
+        """Filter out models that don't have all required fields set."""
+        valid_models = []
+        for model in v:
+            if model.model_name and model.model_header:
+                valid_models.append(model)
+            else:
+                missing = []
+                if not model.model_name:
+                    missing.append("model_name")
+                if not model.model_header:
+                    missing.append("model_header")
+                logger.warning(
+                    f"Discarding custom model due to missing {', '.join(missing)}: {model}"
+                )
+        return valid_models
 class CORSConfig(BaseModel):
     """CORS configuration"""
             new_client = GeminiClientSettings(**overrides)
             result_clients.append(new_client)
         else:
+            raise IndexError(
+                f"Client index {idx} in env is out of range (current count: {len(result_clients)}). "
+                "Client indices must be contiguous starting from 0."
+            )
     return result_clients if result_clients else base_clients
+def extract_gemini_models_env() -> dict[int, dict[str, Any]]:
+    """Extract and remove all Gemini models related environment variables, supporting nested fields."""
+    root_key = "CONFIG_GEMINI__MODELS"
+    env_overrides: dict[int, dict[str, Any]] = {}
+    if root_key in os.environ:
+        val = os.environ[root_key]
+        models_list = None
+        parsed_successfully = False
+        try:
+            models_list = json.loads(val)
+            parsed_successfully = True
+        except json.JSONDecodeError:
+            try:
+                models_list = ast.literal_eval(val)
+                parsed_successfully = True
+            except (ValueError, SyntaxError) as e:
+                logger.warning(f"Failed to parse {root_key} as JSON or Python literal: {e}")
+        if parsed_successfully and isinstance(models_list, list):
+            for idx, model_data in enumerate(models_list):
+                if isinstance(model_data, dict):
+                    env_overrides[idx] = model_data
+            # Remove the environment variable to avoid Pydantic parsing errors
+            del os.environ[root_key]
+    return env_overrides
+def _merge_models_with_env(
+    base_models: list[GeminiModelConfig] | None,
+    env_overrides: dict[int, dict[str, Any]],
+):
+    """Override base_models with env_overrides using standard update (replace whole fields)."""
+    if not env_overrides:
+        return base_models or []
+    result_models: list[GeminiModelConfig] = []
+    if base_models:
+        result_models = [model.model_copy() for model in base_models]
+    for idx in sorted(env_overrides):
+        overrides = env_overrides[idx]
+        if idx < len(result_models):
+            # Update existing model: overwrite fields found in env
+            model_dict = result_models[idx].model_dump()
+            model_dict.update(overrides)
+            result_models[idx] = GeminiModelConfig(**model_dict)
+        elif idx == len(result_models):
+            # Append new models
+            new_model = GeminiModelConfig(**overrides)
+            result_models.append(new_model)
+        else:
+            raise IndexError(
+                f"Model index {idx} in env is out of range (current count: {len(result_models)}). "
+                "Model indices must be contiguous starting from 0."
+            )
+    return result_models
 def initialize_config() -> Config:
     """
     Initialize the configuration.
     try:
         # First, extract and remove Gemini clients related environment variables
         env_clients_overrides = extract_gemini_clients_env()
+        # Extract and remove Gemini models related environment variables
+        env_models_overrides = extract_gemini_models_env()
         # Then, initialize Config with pydantic_settings
         config = Config()  # type: ignore
         # Synthesize clients
         config.gemini.clients = _merge_clients_with_env(
             config.gemini.clients, env_clients_overrides
+        )
+        # Synthesize models
+        config.gemini.models = _merge_models_with_env(config.gemini.models, env_models_overrides)
         return config
     except ValidationError as e:

app/utils/helper.py CHANGED Viewed

@@ -1,12 +1,38 @@
 import base64
 import mimetypes
 import tempfile
 from pathlib import Path
 import httpx
 from loguru import logger
 VALID_TAG_ROLES = {"user", "assistant", "system", "tool"}
 def add_tag(role: str, content: str, unclose: bool = False) -> str:
@@ -18,8 +44,10 @@ def add_tag(role: str, content: str, unclose: bool = False) -> str:
     return f"<|im_start|>{role}\n{content}" + ("\n<|im_end|>" if not unclose else "")
-def estimate_tokens(text: str) -> int:
     """Estimate the number of tokens heuristically based on character count"""
     return int(len(text) / 3)
@@ -36,7 +64,7 @@ async def save_file_to_tempfile(
     return path
-async def save_url_to_tempfile(url: str, tempdir: Path | None = None):
     data: bytes | None = None
     suffix: str | None = None
     if url.startswith("data:image/"):
@@ -47,20 +75,315 @@ async def save_url_to_tempfile(url: str, tempdir: Path | None = None):
         base64_data = url.split(",")[1]
         data = base64.b64decode(base64_data)
-        # Guess extension from mime type, default to the subtype if not found
         suffix = mimetypes.guess_extension(mime_type)
         if not suffix:
             suffix = f".{mime_type.split('/')[1]}"
     else:
-        # http files
-        async with httpx.AsyncClient() as client:
             resp = await client.get(url)
             resp.raise_for_status()
             data = resp.content
-            suffix = Path(url).suffix or ".bin"
     with tempfile.NamedTemporaryFile(delete=False, suffix=suffix, dir=tempdir) as tmp:
         tmp.write(data)
         path = Path(tmp.name)
     return path

 import base64
+import json
 import mimetypes
+import re
+import struct
 import tempfile
+import uuid
 from pathlib import Path
+from typing import Iterator
+from urllib.parse import urlparse
 import httpx
 from loguru import logger
+from ..models import FunctionCall, Message, ToolCall
 VALID_TAG_ROLES = {"user", "assistant", "system", "tool"}
+XML_WRAP_HINT = (
+    "\nYou MUST wrap every tool call response inside a single fenced block exactly like:\n"
+    '```xml\n<tool_call name="tool_name">{"arg": "value"}</tool_call>\n```\n'
+    "Do not surround the fence with any other text or whitespace; otherwise the call will be ignored.\n"
+)
+CODE_BLOCK_HINT = (
+    "\nWhenever you include code, markup, or shell snippets, wrap each snippet in a Markdown fenced "
+    "block and supply the correct language label (for example, ```python ... ``` or ```html ... ```).\n"
+    "Fence ONLY the actual code/markup; keep all narrative or explanatory text outside the fences.\n"
+)
+TOOL_BLOCK_RE = re.compile(r"```xml\s*(.*?)\s*```", re.DOTALL | re.IGNORECASE)
+TOOL_CALL_RE = re.compile(
+    r"<tool_call\s+name=\"([^\"]+)\"\s*>(.*?)</tool_call>", re.DOTALL | re.IGNORECASE
+)
+JSON_FENCE_RE = re.compile(r"^```(?:json)?\s*(.*?)\s*```$", re.DOTALL | re.IGNORECASE)
+CONTROL_TOKEN_RE = re.compile(r"<\|im_(?:start|end)\|>")
+XML_HINT_STRIPPED = XML_WRAP_HINT.strip()
+CODE_HINT_STRIPPED = CODE_BLOCK_HINT.strip()
 def add_tag(role: str, content: str, unclose: bool = False) -> str:
     return f"<|im_start|>{role}\n{content}" + ("\n<|im_end|>" if not unclose else "")
+def estimate_tokens(text: str | None) -> int:
     """Estimate the number of tokens heuristically based on character count"""
+    if not text:
+        return 0
     return int(len(text) / 3)
     return path
+async def save_url_to_tempfile(url: str, tempdir: Path | None = None) -> Path:
     data: bytes | None = None
     suffix: str | None = None
     if url.startswith("data:image/"):
         base64_data = url.split(",")[1]
         data = base64.b64decode(base64_data)
         suffix = mimetypes.guess_extension(mime_type)
         if not suffix:
             suffix = f".{mime_type.split('/')[1]}"
     else:
+        async with httpx.AsyncClient(follow_redirects=True) as client:
             resp = await client.get(url)
             resp.raise_for_status()
             data = resp.content
+            content_type = resp.headers.get("content-type")
+            if content_type:
+                mime_type = content_type.split(";")[0].strip()
+                suffix = mimetypes.guess_extension(mime_type)
+            if not suffix:
+                path_url = urlparse(url).path
+                suffix = Path(path_url).suffix
+            if not suffix:
+                suffix = ".bin"
     with tempfile.NamedTemporaryFile(delete=False, suffix=suffix, dir=tempdir) as tmp:
         tmp.write(data)
         path = Path(tmp.name)
     return path
+def strip_code_fence(text: str) -> str:
+    """Remove surrounding ```json fences if present."""
+    match = JSON_FENCE_RE.match(text.strip())
+    if match:
+        return match.group(1).strip()
+    return text.strip()
+def strip_tagged_blocks(text: str) -> str:
+    """Remove <|im_start|>role ... <|im_end|> sections, dropping tool blocks entirely.
+    - tool blocks are removed entirely (if missing end marker, drop to EOF).
+    - other roles: remove markers and role, keep inner content (if missing end marker, keep to EOF).
+    """
+    if not text:
+        return text
+    result: list[str] = []
+    idx = 0
+    length = len(text)
+    start_marker = "<|im_start|>"
+    end_marker = "<|im_end|>"
+    while idx < length:
+        start = text.find(start_marker, idx)
+        if start == -1:
+            result.append(text[idx:])
+            break
+        # append any content before this block
+        result.append(text[idx:start])
+        role_start = start + len(start_marker)
+        newline = text.find("\n", role_start)
+        if newline == -1:
+            # malformed block; keep the remainder as-is (safe behavior)
+            result.append(text[start:])
+            break
+        role = text[role_start:newline].strip().lower()
+        end = text.find(end_marker, newline + 1)
+        if end == -1:
+            # missing end marker
+            if role == "tool":
+                # drop from the start marker to EOF (skip the remainder)
+                break
+            else:
+                # keep inner content from after the role newline to EOF
+                result.append(text[newline + 1 :])
+                break
+        block_end = end + len(end_marker)
+        if role == "tool":
+            # drop the whole block
+            idx = block_end
+            continue
+        # keep the content without role markers
+        content = text[newline + 1 : end]
+        result.append(content)
+        idx = block_end
+    return "".join(result)
+def strip_system_hints(text: str) -> str:
+    """Remove system-level hint text from a given string."""
+    if not text:
+        return text
+    cleaned = strip_tagged_blocks(text)
+    cleaned = cleaned.replace(XML_WRAP_HINT, "").replace(XML_HINT_STRIPPED, "")
+    cleaned = cleaned.replace(CODE_BLOCK_HINT, "").replace(CODE_HINT_STRIPPED, "")
+    cleaned = CONTROL_TOKEN_RE.sub("", cleaned)
+    return cleaned.strip()
+def remove_tool_call_blocks(text: str) -> str:
+    """Strip tool call code blocks from text."""
+    if not text:
+        return text
+    # 1. Remove fenced blocks ONLY if they contain tool calls
+    def _replace_block(match: re.Match[str]) -> str:
+        block_content = match.group(1)
+        if not block_content:
+            return match.group(0)
+        # Check if the block contains any tool call tag
+        if TOOL_CALL_RE.search(block_content):
+            return ""
+        # Preserve the block if no tool call found
+        return match.group(0)
+    cleaned = TOOL_BLOCK_RE.sub(_replace_block, text)
+    # 2. Remove orphaned tool calls
+    cleaned = TOOL_CALL_RE.sub("", cleaned)
+    return strip_system_hints(cleaned)
+def extract_tool_calls(text: str) -> tuple[str, list[ToolCall]]:
+    """Extract tool call definitions and return cleaned text."""
+    if not text:
+        return text, []
+    tool_calls: list[ToolCall] = []
+    def _create_tool_call(name: str, raw_args: str) -> None:
+        """Helper to parse args and append to the tool_calls list."""
+        if not name:
+            logger.warning("Encountered tool_call without a function name.")
+            return
+        arguments = raw_args
+        try:
+            parsed_args = json.loads(raw_args)
+            arguments = json.dumps(parsed_args, ensure_ascii=False)
+        except json.JSONDecodeError:
+            logger.warning(f"Failed to parse tool call arguments for '{name}'. Passing raw string.")
+        tool_calls.append(
+            ToolCall(
+                id=f"call_{uuid.uuid4().hex}",
+                type="function",
+                function=FunctionCall(name=name, arguments=arguments),
+            )
+        )
+    def _replace_block(match: re.Match[str]) -> str:
+        block_content = match.group(1)
+        if not block_content:
+            return match.group(0)
+        found_in_block = False
+        for call_match in TOOL_CALL_RE.finditer(block_content):
+            found_in_block = True
+            name = (call_match.group(1) or "").strip()
+            raw_args = (call_match.group(2) or "").strip()
+            _create_tool_call(name, raw_args)
+        if found_in_block:
+            return ""
+        else:
+            return match.group(0)
+    cleaned = TOOL_BLOCK_RE.sub(_replace_block, text)
+    def _replace_orphan(match: re.Match[str]) -> str:
+        name = (match.group(1) or "").strip()
+        raw_args = (match.group(2) or "").strip()
+        _create_tool_call(name, raw_args)
+        return ""
+    cleaned = TOOL_CALL_RE.sub(_replace_orphan, cleaned)
+    cleaned = strip_system_hints(cleaned)
+    return cleaned, tool_calls
+def iter_stream_segments(model_output: str, chunk_size: int = 64) -> Iterator[str]:
+    """Yield stream segments while keeping <think> markers and words intact."""
+    if not model_output:
+        return
+    token_pattern = re.compile(r"\s+|\S+\s*")
+    pending = ""
+    def _flush_pending() -> Iterator[str]:
+        nonlocal pending
+        if pending:
+            yield pending
+            pending = ""
+    # Split on <think> boundaries so the markers are never fragmented.
+    parts = re.split(r"(</?think>)", model_output)
+    for part in parts:
+        if not part:
+            continue
+        if part in {"<think>", "</think>"}:
+            yield from _flush_pending()
+            yield part
+            continue
+        for match in token_pattern.finditer(part):
+            token = match.group(0)
+            if len(token) > chunk_size:
+                yield from _flush_pending()
+                for idx in range(0, len(token), chunk_size):
+                    yield token[idx : idx + chunk_size]
+                continue
+            if pending and len(pending) + len(token) > chunk_size:
+                yield from _flush_pending()
+            pending += token
+    yield from _flush_pending()
+def text_from_message(message: Message) -> str:
+    """Return text content from a message for token estimation."""
+    base_text = ""
+    if isinstance(message.content, str):
+        base_text = message.content
+    elif isinstance(message.content, list):
+        base_text = "\n".join(
+            item.text or "" for item in message.content if getattr(item, "type", "") == "text"
+        )
+    elif message.content is None:
+        base_text = ""
+    if message.tool_calls:
+        tool_arg_text = "".join(call.function.arguments or "" for call in message.tool_calls)
+        base_text = f"{base_text}\n{tool_arg_text}" if base_text else tool_arg_text
+    return base_text
+def extract_image_dimensions(data: bytes) -> tuple[int | None, int | None]:
+    """Return image dimensions (width, height) if PNG or JPEG headers are present."""
+    # PNG: dimensions stored in bytes 16..24 of the IHDR chunk
+    if len(data) >= 24 and data.startswith(b"\x89PNG\r\n\x1a\n"):
+        try:
+            width, height = struct.unpack(">II", data[16:24])
+            return int(width), int(height)
+        except struct.error:
+            return None, None
+    # JPEG: dimensions stored in SOF segment; iterate through markers to locate it
+    if len(data) >= 4 and data[0:2] == b"\xff\xd8":
+        idx = 2
+        length = len(data)
+        sof_markers = {
+            0xC0,
+            0xC1,
+            0xC2,
+            0xC3,
+            0xC5,
+            0xC6,
+            0xC7,
+            0xC9,
+            0xCA,
+            0xCB,
+            0xCD,
+            0xCE,
+            0xCF,
+        }
+        while idx < length:
+            # Find marker alignment (markers are prefixed with 0xFF bytes)
+            if data[idx] != 0xFF:
+                idx += 1
+                continue
+            while idx < length and data[idx] == 0xFF:
+                idx += 1
+            if idx >= length:
+                break
+            marker = data[idx]
+            idx += 1
+            if marker in (0xD8, 0xD9, 0x01) or 0xD0 <= marker <= 0xD7:
+                continue
+            if idx + 1 >= length:
+                break
+            segment_length = (data[idx] << 8) + data[idx + 1]
+            idx += 2
+            if segment_length < 2:
+                break
+            if marker in sof_markers:
+                if idx + 4 < length:
+                    # Skip precision byte at idx, then read height/width (big-endian)
+                    height = (data[idx + 1] << 8) + data[idx + 2]
+                    width = (data[idx + 3] << 8) + data[idx + 4]
+                    return int(width), int(height)
+                break
+            idx += segment_length - 2
+    return None, None

config/config.yaml CHANGED Viewed

@@ -27,6 +27,8 @@ gemini:
   refresh_interval: 540    # Refresh interval in seconds
   verbose: false           # Enable verbose logging for Gemini requests
   max_chars_per_request: 1000000     # Maximum characters Gemini Web accepts per request. Non-pro users might have a lower limit
 storage:
   path: "data/lmdb"        # Database storage path

   refresh_interval: 540    # Refresh interval in seconds
   verbose: false           # Enable verbose logging for Gemini requests
   max_chars_per_request: 1000000     # Maximum characters Gemini Web accepts per request. Non-pro users might have a lower limit
+  model_strategy: "append" # Strategy: 'append' (default + custom) or 'overwrite' (custom only)
+  models: []
 storage:
   path: "data/lmdb"        # Database storage path