Spaces:

Yash030
/

claude-code-proxy

Running

Yash030 Claude Opus 4.7 commited on 5 days ago

Commit

d64f2a2

1 Parent(s): 04fcbd7

docs: update CLAUDE.md with model list and health tracking

- Add Zen free models (big-pickle, ring-2.6-1t-free, nemotron-3-super-free)
- Add NVIDIA NIM model descriptions
- Document auto-routing with health tracking
- Simplify key modules section
- Update environment configuration

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Files changed (1) hide show

CLAUDE.md +40 -26

CLAUDE.md CHANGED Viewed

@@ -6,6 +6,23 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
 Free Claude Code is a FastAPI proxy that routes Claude Code's Anthropic Messages API calls to backend providers (NVIDIA NIM, Zen). It translates between client-side Anthropic protocol and provider-specific transports (OpenAI chat format, native APIs), handling SSE streaming, thinking blocks, tool calls, and token usage metadata normalization.
 ## Commands
 ```bash
@@ -31,30 +48,26 @@ Run format → lint → type check in that order before pushing. CI enforces the
 ```
 Claude Code CLI → api/routes.py (FastAPI) → api/model_router.py → providers/* → upstream
                                                     ↓
-                                            core/chain_engine.py (sticky sessions)
 ```
 ### Key Modules
-- **api/routes.py** — FastAPI route handlers for `/v1/messages`, `/v1/models`, `/v1/messages/counttokens`
-- **api/admin.py** — Admin dashboard endpoints: `GET /admin` (HTML), `GET /api/admin/sessions` (JSON)
-- **api/services.py** — Sticky session logic: once a model yields first event (including thinking blocks), it stays committed for that turn
-- **api/model_router.py** — Resolves Claude model names to provider/model pairs using MODEL_OPUS/MODEL_SONNET/MODEL_HAIKU/MODEL env vars
-- **api/gateway_model_ids.py** — Gateway model ID mapping for picker integration and "(no thinking)" variants
-- **api/detection.py** — Request type detection for optimization routing
-- **api/optimization_handlers.py** — Handles local optimization for trivial requests (health probes, capability checks)
-- **api/dependencies.py** — Dependency injection for providers, settings, and request validation
-- **api/web_server_tools.py** — Web fetch/search tool implementations for Claude Code server tools
-- **providers/registry.py** — Provider factory and caching; builds ProviderConfig from settings, creates provider instances
-- **providers/base.py** — Abstract `BaseProvider` interface; `stream_response()` yields Anthropic SSE
-- **providers/openai_compat.py** — OpenAI chat → Anthropic SSE translation for NVIDIA NIM
-- **providers/nvidia_nim/** — NVIDIA NIM-specific transport implementation
-- **providers/zen/** — Zen (opencode.ai) native transport implementation
-- **core/chain_engine.py** — Orchestrates multi-provider fallback chains
-- **core/task_detector.py** — Detects trivial requests for local optimization
-- **core/session_tracker.py** — Tracks active sessions and request state
-- **core/model_capabilities.py** — Model capability detection and feature routing
-- **messaging/** — Discord/Telegram bot wrappers for remote Claude Code sessions
 ### Provider Model Format
 Model values use `provider_id/model/name` format (e.g., `nvidia_nim/z-ai/glm4.7` or `zen/minimax-m2.5-free`).
@@ -64,12 +77,13 @@ Model values use `provider_id/model/name` format (e.g., `nvidia_nim/z-ai/glm4.7`
 ## Python 3.14 Notes
-The `except X, Y:` syntax is valid in Python 3.14 (reintroduced). PyUpgrade is configured for py314, so don't modernize this syntax away.
 ## Environment Configuration
-Settings are loaded via pydantic-settings from `.env`. Key variables:
-- `MODEL` — Comma-separated fallback model list (e.g., `zen/minimax-m2.5-free,nvidia_nim/z-ai/glm4.7`)
-- `MODEL_OPUS`, `MODEL_SONNET`, `MODEL_HAIKU` — Per-tier overrides
-- `ANTHROPIC_AUTH_TOKEN` — Must match value sent by Claude Code client
-- Provider API keys: `NVIDIA_NIM_API_KEY`, `ZEN_API_KEY`

 Free Claude Code is a FastAPI proxy that routes Claude Code's Anthropic Messages API calls to backend providers (NVIDIA NIM, Zen). It translates between client-side Anthropic protocol and provider-specific transports (OpenAI chat format, native APIs), handling SSE streaming, thinking blocks, tool calls, and token usage metadata normalization.
+## Free Models
+### Zen/OpenCode (Free Tier)
+- `zen/minimax-m2.5-free` - Default, Claude Code capable
+- `zen/big-pickle` - Free tier
+- `zen/ring-2.6-1t-free` - Free tier
+- `zen/nemotron-3-super-free` - Free tier
+### NVIDIA NIM (7 Models)
+- `nvidia_nim/qwen/qwen3-coder-480b-a35b-instruct` - Code generation
+- `nvidia_nim/z-ai/glm4.7` - General purpose
+- `nvidia_nim/stepfun-ai/step-3.5-flash` - Fast responses
+- `nvidia_nim/mistralai/mistral-large-3-675b-instruct-2512` - Reasoning
+- `nvidia_nim/abacusai/dracarys-llama-3.1-70b-instruct` - Complex tasks
+- `nvidia_nim/bytedance/seed-oss-36b-instruct` - Balanced
+- `nvidia_nim/mistralai/mistral-nemotron` - Thinking tasks
 ## Commands
 ```bash
 ```
 Claude Code CLI → api/routes.py (FastAPI) → api/model_router.py → providers/* → upstream
                                                     ↓
+                                            core/chain_engine.py (fallback)
 ```
+### Auto-Routing with Health Tracking
+The proxy includes intelligent model selection:
+1. Pre-flight health check (recent failures in 30s window)
+2. Skip unhealthy models (3+ failures = unhealthy for 30s)
+3. Automatic failover on timeout/rate-limit
+4. 40 req/min rate limit respected
 ### Key Modules
+- **api/routes.py** — FastAPI routes + REQUESTED_PROVIDER_MODELS list
+- **api/services.py** — Request handling, fallback logic, failure recording
+- **api/model_router.py** — Model resolution with health-aware candidate selection
+- **api/optimization_handlers.py** — Fast-path for trivial requests
+- **providers/rate_limit.py** — GlobalRateLimiter + ModelHealthTracker
+- **providers/nvidia_nim/client.py** — NIM provider with fast timeouts
+- **providers/zen/client.py** — Zen/OpenCode provider
+- **providers/openai_compat.py** — OpenAI chat → Anthropic SSE translation
 ### Provider Model Format
 Model values use `provider_id/model/name` format (e.g., `nvidia_nim/z-ai/glm4.7` or `zen/minimax-m2.5-free`).
 ## Python 3.14 Notes
+The `except X, Y:` syntax is valid in Python 3.14 (reintroduced). Do not modernize this syntax away.
 ## Environment Configuration
+Key variables in `.env`:
+- `MODEL` — Primary model (e.g., `zen/minimax-m2.5-free`)
+- `AUTO_MODEL_PRIORITY` — Comma-separated fallback order
+- `NVIDIA_NIM_API_KEY` — NVIDIA API key
+- `ANTHROPIC_AUTH_TOKEN` — Auth token (any secret)
+- `ENABLE_MODEL_THINKING` — Enable reasoning blocks