Yash030 Claude Opus 4.7 commited on
Commit
d64f2a2
Β·
1 Parent(s): 04fcbd7

docs: update CLAUDE.md with model list and health tracking

Browse files

- Add Zen free models (big-pickle, ring-2.6-1t-free, nemotron-3-super-free)
- Add NVIDIA NIM model descriptions
- Document auto-routing with health tracking
- Simplify key modules section
- Update environment configuration

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Files changed (1) hide show
  1. CLAUDE.md +40 -26
CLAUDE.md CHANGED
@@ -6,6 +6,23 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
6
 
7
  Free Claude Code is a FastAPI proxy that routes Claude Code's Anthropic Messages API calls to backend providers (NVIDIA NIM, Zen). It translates between client-side Anthropic protocol and provider-specific transports (OpenAI chat format, native APIs), handling SSE streaming, thinking blocks, tool calls, and token usage metadata normalization.
8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  ## Commands
10
 
11
  ```bash
@@ -31,30 +48,26 @@ Run format β†’ lint β†’ type check in that order before pushing. CI enforces the
31
  ```
32
  Claude Code CLI β†’ api/routes.py (FastAPI) β†’ api/model_router.py β†’ providers/* β†’ upstream
33
  ↓
34
- core/chain_engine.py (sticky sessions)
35
  ```
36
 
 
 
 
 
 
 
 
37
  ### Key Modules
38
 
39
- - **api/routes.py** β€” FastAPI route handlers for `/v1/messages`, `/v1/models`, `/v1/messages/counttokens`
40
- - **api/admin.py** β€” Admin dashboard endpoints: `GET /admin` (HTML), `GET /api/admin/sessions` (JSON)
41
- - **api/services.py** β€” Sticky session logic: once a model yields first event (including thinking blocks), it stays committed for that turn
42
- - **api/model_router.py** β€” Resolves Claude model names to provider/model pairs using MODEL_OPUS/MODEL_SONNET/MODEL_HAIKU/MODEL env vars
43
- - **api/gateway_model_ids.py** β€” Gateway model ID mapping for picker integration and "(no thinking)" variants
44
- - **api/detection.py** β€” Request type detection for optimization routing
45
- - **api/optimization_handlers.py** β€” Handles local optimization for trivial requests (health probes, capability checks)
46
- - **api/dependencies.py** β€” Dependency injection for providers, settings, and request validation
47
- - **api/web_server_tools.py** β€” Web fetch/search tool implementations for Claude Code server tools
48
- - **providers/registry.py** β€” Provider factory and caching; builds ProviderConfig from settings, creates provider instances
49
- - **providers/base.py** β€” Abstract `BaseProvider` interface; `stream_response()` yields Anthropic SSE
50
- - **providers/openai_compat.py** β€” OpenAI chat β†’ Anthropic SSE translation for NVIDIA NIM
51
- - **providers/nvidia_nim/** β€” NVIDIA NIM-specific transport implementation
52
- - **providers/zen/** β€” Zen (opencode.ai) native transport implementation
53
- - **core/chain_engine.py** β€” Orchestrates multi-provider fallback chains
54
- - **core/task_detector.py** β€” Detects trivial requests for local optimization
55
- - **core/session_tracker.py** β€” Tracks active sessions and request state
56
- - **core/model_capabilities.py** β€” Model capability detection and feature routing
57
- - **messaging/** β€” Discord/Telegram bot wrappers for remote Claude Code sessions
58
 
59
  ### Provider Model Format
60
  Model values use `provider_id/model/name` format (e.g., `nvidia_nim/z-ai/glm4.7` or `zen/minimax-m2.5-free`).
@@ -64,12 +77,13 @@ Model values use `provider_id/model/name` format (e.g., `nvidia_nim/z-ai/glm4.7`
64
 
65
  ## Python 3.14 Notes
66
 
67
- The `except X, Y:` syntax is valid in Python 3.14 (reintroduced). PyUpgrade is configured for py314, so don't modernize this syntax away.
68
 
69
  ## Environment Configuration
70
 
71
- Settings are loaded via pydantic-settings from `.env`. Key variables:
72
- - `MODEL` β€” Comma-separated fallback model list (e.g., `zen/minimax-m2.5-free,nvidia_nim/z-ai/glm4.7`)
73
- - `MODEL_OPUS`, `MODEL_SONNET`, `MODEL_HAIKU` β€” Per-tier overrides
74
- - `ANTHROPIC_AUTH_TOKEN` β€” Must match value sent by Claude Code client
75
- - Provider API keys: `NVIDIA_NIM_API_KEY`, `ZEN_API_KEY`
 
 
6
 
7
  Free Claude Code is a FastAPI proxy that routes Claude Code's Anthropic Messages API calls to backend providers (NVIDIA NIM, Zen). It translates between client-side Anthropic protocol and provider-specific transports (OpenAI chat format, native APIs), handling SSE streaming, thinking blocks, tool calls, and token usage metadata normalization.
8
 
9
+ ## Free Models
10
+
11
+ ### Zen/OpenCode (Free Tier)
12
+ - `zen/minimax-m2.5-free` - Default, Claude Code capable
13
+ - `zen/big-pickle` - Free tier
14
+ - `zen/ring-2.6-1t-free` - Free tier
15
+ - `zen/nemotron-3-super-free` - Free tier
16
+
17
+ ### NVIDIA NIM (7 Models)
18
+ - `nvidia_nim/qwen/qwen3-coder-480b-a35b-instruct` - Code generation
19
+ - `nvidia_nim/z-ai/glm4.7` - General purpose
20
+ - `nvidia_nim/stepfun-ai/step-3.5-flash` - Fast responses
21
+ - `nvidia_nim/mistralai/mistral-large-3-675b-instruct-2512` - Reasoning
22
+ - `nvidia_nim/abacusai/dracarys-llama-3.1-70b-instruct` - Complex tasks
23
+ - `nvidia_nim/bytedance/seed-oss-36b-instruct` - Balanced
24
+ - `nvidia_nim/mistralai/mistral-nemotron` - Thinking tasks
25
+
26
  ## Commands
27
 
28
  ```bash
 
48
  ```
49
  Claude Code CLI β†’ api/routes.py (FastAPI) β†’ api/model_router.py β†’ providers/* β†’ upstream
50
  ↓
51
+ core/chain_engine.py (fallback)
52
  ```
53
 
54
+ ### Auto-Routing with Health Tracking
55
+ The proxy includes intelligent model selection:
56
+ 1. Pre-flight health check (recent failures in 30s window)
57
+ 2. Skip unhealthy models (3+ failures = unhealthy for 30s)
58
+ 3. Automatic failover on timeout/rate-limit
59
+ 4. 40 req/min rate limit respected
60
+
61
  ### Key Modules
62
 
63
+ - **api/routes.py** β€” FastAPI routes + REQUESTED_PROVIDER_MODELS list
64
+ - **api/services.py** β€” Request handling, fallback logic, failure recording
65
+ - **api/model_router.py** β€” Model resolution with health-aware candidate selection
66
+ - **api/optimization_handlers.py** β€” Fast-path for trivial requests
67
+ - **providers/rate_limit.py** β€” GlobalRateLimiter + ModelHealthTracker
68
+ - **providers/nvidia_nim/client.py** β€” NIM provider with fast timeouts
69
+ - **providers/zen/client.py** β€” Zen/OpenCode provider
70
+ - **providers/openai_compat.py** β€” OpenAI chat β†’ Anthropic SSE translation
 
 
 
 
 
 
 
 
 
 
 
71
 
72
  ### Provider Model Format
73
  Model values use `provider_id/model/name` format (e.g., `nvidia_nim/z-ai/glm4.7` or `zen/minimax-m2.5-free`).
 
77
 
78
  ## Python 3.14 Notes
79
 
80
+ The `except X, Y:` syntax is valid in Python 3.14 (reintroduced). Do not modernize this syntax away.
81
 
82
  ## Environment Configuration
83
 
84
+ Key variables in `.env`:
85
+ - `MODEL` β€” Primary model (e.g., `zen/minimax-m2.5-free`)
86
+ - `AUTO_MODEL_PRIORITY` β€” Comma-separated fallback order
87
+ - `NVIDIA_NIM_API_KEY` β€” NVIDIA API key
88
+ - `ANTHROPIC_AUTH_TOKEN` β€” Auth token (any secret)
89
+ - `ENABLE_MODEL_THINKING` β€” Enable reasoning blocks