# z.ai provider + MCP proxy (implemented) This document describes the z.ai integration that is implemented on the `feat/zai-passthrough-mcp` branch: what was added, how it works internally, and how to validate it. Related deep dives: - [`docs/zai/provider.md`](provider.md) - [`docs/zai/mcp.md`](mcp.md) - [`docs/zai/vision-mcp.md`](vision-mcp.md) - [`docs/proxy/auth.md`](../proxy/auth.md) - [`docs/proxy/accounts.md`](../proxy/accounts.md) ## Scope (current) - z.ai is integrated as an **optional upstream** for **Anthropic/Claude protocol only** (`/v1/messages`, `/v1/messages/count_tokens`). - OpenAI and Gemini protocol handlers are unchanged and continue to use the existing Google-backed pool. - z.ai MCP (Search + Reader) is exposed via local proxy endpoints (reverse proxy) and injects the z.ai API key upstream. - Vision MCP is exposed via a **built-in MCP server** (local endpoint) and uses the stored z.ai API key to call the z.ai vision API. ## Configuration All settings are persisted in the existing data directory (same place as Google accounts and `gui_config.json`). ### Proxy auth - `proxy.auth_mode` (`off` | `strict` | `all_except_health` | `auto`) - `off`: no auth required - `strict`: auth required for all routes - `all_except_health`: auth required for all routes except `GET /healthz` - `auto`: if `allow_lan_access=true` -> `all_except_health`, else `off` - `proxy.api_key`: required when auth is enabled Implementation: - Backend enum: [`src-tauri/src/proxy/config.rs`](../../src-tauri/src/proxy/config.rs) (`ProxyAuthMode`) - Effective policy resolver: [`src-tauri/src/proxy/security.rs`](../../src-tauri/src/proxy/security.rs) - Middleware enforcement: [`src-tauri/src/proxy/middleware/auth.rs`](../../src-tauri/src/proxy/middleware/auth.rs) ### z.ai provider Config lives under `proxy.zai` (`src-tauri/src/proxy/config.rs`): - `enabled: bool` - `base_url: string` (default `https://api.z.ai/api/anthropic`) - `api_key: string` - `dispatch_mode: off | exclusive | pooled | fallback` - `off`: never use z.ai - `exclusive`: all Claude protocol requests go to z.ai - `pooled`: z.ai is treated as **one additional slot** in the shared pool (no priority, no strict guarantee) - `fallback`: z.ai is used only when the Google pool has 0 accounts - `models`: defaults used when the incoming Anthropic request uses `claude-*` model ids - `opus` default `glm-4.7` - `sonnet` default `glm-4.7` - `haiku` default `glm-4.5-air` - `model_mapping`: optional exact-match overrides (`{ "": "" }`) - When a key matches the incoming `model` string, it is replaced with the mapped z.ai model id before forwarding upstream. - `mcp` toggles: - `enabled` - `web_search_enabled` - `web_reader_enabled` - `vision_enabled` Runtime hot update: - `save_config` hot-updates `auth`, `upstream_proxy`, `model mappings`, and `z.ai` without restart. - `src-tauri/src/commands/mod.rs` calls `axum_server.update_security(...)` and `axum_server.update_zai(...)`. ## Request routing ### `/v1/messages` (Anthropic messages) Handler: `src-tauri/src/proxy/handlers/claude.rs` (`handle_messages`) Flow: 1. The handler receives `HeaderMap` + raw JSON `Value`. 2. It decides whether to use z.ai or the existing Google flow: - If z.ai is disabled -> use Google flow. - If `dispatch_mode=exclusive` -> use z.ai. - If `dispatch_mode=fallback` -> use z.ai only if Google pool size is 0. - If `dispatch_mode=pooled` -> use round-robin across `(google_accounts + 1)` slots; slot `0` is z.ai, others are Google. 3. If z.ai is selected: - The raw JSON is forwarded to z.ai as-is (streaming is supported by byte passthrough). - The request `model` may be rewritten: - if `proxy.zai.model_mapping` contains an exact match, that mapping wins - `glm-*` stays unchanged - `claude-*` becomes one of `proxy.zai.models.{opus,sonnet,haiku}` based on name match 4. Otherwise: - The existing Claude→Gemini transform and Google-backed execution path runs as before. ### `/v1/messages/count_tokens` Handler: `src-tauri/src/proxy/handlers/claude.rs` (`handle_count_tokens`) - If z.ai is enabled (mode != off), this request is forwarded to z.ai. - Otherwise it returns the existing placeholder `{input_tokens: 0, output_tokens: 0}`. ## Upstream forwarding details (z.ai Anthropic) Provider: `src-tauri/src/proxy/providers/zai_anthropic.rs` Security / header handling: - The local proxy API key must **never** be forwarded upstream. - Only a conservative set of incoming headers is forwarded (e.g. `content-type`, `accept`, `anthropic-version`, `user-agent`). - z.ai auth is injected: - If the client used `x-api-key`, it is replaced with z.ai key. - If the client used `Authorization`, it is replaced with `Bearer `. - If neither is present, `x-api-key: ` is used. - Responses are streamed back to the client without parsing SSE. Networking: - Respects the global upstream proxy config (`proxy.upstream_proxy`) for outbound HTTP calls. ## MCP reverse proxy (Search + Reader) Handlers: `src-tauri/src/proxy/handlers/mcp.rs` Routes: `src-tauri/src/proxy/server.rs` Local endpoints: - `/mcp/web_search_prime/mcp` → `https://api.z.ai/api/mcp/web_search_prime/mcp` - `/mcp/web_reader/mcp` → `https://api.z.ai/api/mcp/web_reader/mcp` Behavior: - Controlled by `proxy.zai.mcp.*` flags: - If `mcp.enabled=false` -> endpoints return 404. - If per-server flag is false -> returns 404 for that endpoint. - z.ai key is injected upstream as `Authorization: Bearer `. - Response body is streamed back to the client. Note: - These endpoints are still subject to the proxy’s auth middleware depending on `proxy.auth_mode`. ## Vision MCP (built-in server) Handlers: - [`src-tauri/src/proxy/handlers/mcp.rs`](../../src-tauri/src/proxy/handlers/mcp.rs) (`handle_zai_mcp_server`) - [`src-tauri/src/proxy/zai_vision_tools.rs`](../../src-tauri/src/proxy/zai_vision_tools.rs) (tool registry + z.ai vision API client) Local endpoint: - `/mcp/zai-mcp-server/mcp` Behavior: - Controlled by `proxy.zai.mcp.enabled` and `proxy.zai.mcp.vision_enabled`. - If `mcp.enabled=false` -> returns 404. - If `vision_enabled=false` -> returns 404. - No z.ai key is required from MCP clients: - the proxy injects the stored `proxy.zai.api_key` when calling the z.ai vision API. - Implements a minimal Streamable HTTP MCP flow: - `POST /mcp` supports `initialize`, `tools/list`, `tools/call` - `GET /mcp` returns an SSE stream with keep-alive events for an initialized session - `DELETE /mcp` terminates a session Upstream calls: - z.ai vision endpoint: `https://api.z.ai/api/paas/v4/chat/completions` - Uses `Authorization: Bearer ` - Default model: `glm-4.6v` (hardcoded for now) Tool input and limits: - Images: `.png`, `.jpg`, `.jpeg` up to 5 MB (local files are encoded as `data:;base64,...`). - Videos: `.mp4`, `.mov`, `.m4v` up to 8 MB. - Supported tools: - `ui_to_artifact` - `extract_text_from_screenshot` - `diagnose_error_screenshot` - `understand_technical_diagram` - `analyze_data_visualization` - `ui_diff_check` - `analyze_image` - `analyze_video` ## UI Page: `src/pages/ApiProxy.tsx` Added controls: - Authorization toggle + mode selector (`off/strict/all_except_health/auto`) - z.ai block: - enable toggle - base_url - dispatch mode - api key input (stored locally) - model mapping UI: - fetch available model ids from the z.ai upstream (`GET /v1/models`) - configure default `opus/sonnet/haiku` mapping - configure optional exact-match overrides - MCP toggles + display of local MCP endpoints Translations: - `src/locales/en.json` - `src/locales/zh.json` ## Validation checklist Build: - Frontend: `npm run build` - Backend: `cd src-tauri && cargo build` Manual (example): 1) Enable proxy auth (strict or all-except-health) and note `proxy.api_key`. 2) Enable z.ai and set: - `dispatch_mode=exclusive` - `api_key=` 3) Start proxy and call: - `GET http://127.0.0.1:/healthz` (should work without auth in all-except-health; always works in off) - `POST http://127.0.0.1:/v1/messages` with `Authorization: Bearer ` and a normal Anthropic request body. 4) Enable MCP Search and call local `/mcp/web_search_prime/mcp` via an MCP client (the proxy injects z.ai auth upstream). 5) Enable Vision MCP and verify the tool list: - `POST http://127.0.0.1:/mcp/zai-mcp-server/mcp` with a JSON-RPC `initialize` - then `POST ...` with `tools/list` using the returned `Mcp-Session-Id` header. ## Known limitations / follow-ups - Vision MCP currently implements the core methods needed for tool calls but is not yet a full feature-complete MCP server (prompts/resources, resumability, streaming tool output). - z.ai usage/budget (monitor endpoints) is not implemented yet. - Claude model list endpoint remains a static stub (`/v1/models/claude`) and is not yet provider-aware.