Spaces:
Running
Running
docs: update CLAUDE.md with model list and health tracking
Browse files- Add Zen free models (big-pickle, ring-2.6-1t-free, nemotron-3-super-free)
- Add NVIDIA NIM model descriptions
- Document auto-routing with health tracking
- Simplify key modules section
- Update environment configuration
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
CLAUDE.md
CHANGED
|
@@ -6,6 +6,23 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
|
|
| 6 |
|
| 7 |
Free Claude Code is a FastAPI proxy that routes Claude Code's Anthropic Messages API calls to backend providers (NVIDIA NIM, Zen). It translates between client-side Anthropic protocol and provider-specific transports (OpenAI chat format, native APIs), handling SSE streaming, thinking blocks, tool calls, and token usage metadata normalization.
|
| 8 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
## Commands
|
| 10 |
|
| 11 |
```bash
|
|
@@ -31,30 +48,26 @@ Run format β lint β type check in that order before pushing. CI enforces the
|
|
| 31 |
```
|
| 32 |
Claude Code CLI β api/routes.py (FastAPI) β api/model_router.py β providers/* β upstream
|
| 33 |
β
|
| 34 |
-
core/chain_engine.py (
|
| 35 |
```
|
| 36 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
### Key Modules
|
| 38 |
|
| 39 |
-
- **api/routes.py** β FastAPI
|
| 40 |
-
- **api/
|
| 41 |
-
- **api/
|
| 42 |
-
- **api/
|
| 43 |
-
- **
|
| 44 |
-
- **
|
| 45 |
-
- **
|
| 46 |
-
- **
|
| 47 |
-
- **api/web_server_tools.py** β Web fetch/search tool implementations for Claude Code server tools
|
| 48 |
-
- **providers/registry.py** β Provider factory and caching; builds ProviderConfig from settings, creates provider instances
|
| 49 |
-
- **providers/base.py** β Abstract `BaseProvider` interface; `stream_response()` yields Anthropic SSE
|
| 50 |
-
- **providers/openai_compat.py** β OpenAI chat β Anthropic SSE translation for NVIDIA NIM
|
| 51 |
-
- **providers/nvidia_nim/** β NVIDIA NIM-specific transport implementation
|
| 52 |
-
- **providers/zen/** β Zen (opencode.ai) native transport implementation
|
| 53 |
-
- **core/chain_engine.py** β Orchestrates multi-provider fallback chains
|
| 54 |
-
- **core/task_detector.py** β Detects trivial requests for local optimization
|
| 55 |
-
- **core/session_tracker.py** β Tracks active sessions and request state
|
| 56 |
-
- **core/model_capabilities.py** β Model capability detection and feature routing
|
| 57 |
-
- **messaging/** β Discord/Telegram bot wrappers for remote Claude Code sessions
|
| 58 |
|
| 59 |
### Provider Model Format
|
| 60 |
Model values use `provider_id/model/name` format (e.g., `nvidia_nim/z-ai/glm4.7` or `zen/minimax-m2.5-free`).
|
|
@@ -64,12 +77,13 @@ Model values use `provider_id/model/name` format (e.g., `nvidia_nim/z-ai/glm4.7`
|
|
| 64 |
|
| 65 |
## Python 3.14 Notes
|
| 66 |
|
| 67 |
-
The `except X, Y:` syntax is valid in Python 3.14 (reintroduced).
|
| 68 |
|
| 69 |
## Environment Configuration
|
| 70 |
|
| 71 |
-
|
| 72 |
-
- `MODEL` β
|
| 73 |
-
- `
|
| 74 |
-
- `
|
| 75 |
-
-
|
|
|
|
|
|
| 6 |
|
| 7 |
Free Claude Code is a FastAPI proxy that routes Claude Code's Anthropic Messages API calls to backend providers (NVIDIA NIM, Zen). It translates between client-side Anthropic protocol and provider-specific transports (OpenAI chat format, native APIs), handling SSE streaming, thinking blocks, tool calls, and token usage metadata normalization.
|
| 8 |
|
| 9 |
+
## Free Models
|
| 10 |
+
|
| 11 |
+
### Zen/OpenCode (Free Tier)
|
| 12 |
+
- `zen/minimax-m2.5-free` - Default, Claude Code capable
|
| 13 |
+
- `zen/big-pickle` - Free tier
|
| 14 |
+
- `zen/ring-2.6-1t-free` - Free tier
|
| 15 |
+
- `zen/nemotron-3-super-free` - Free tier
|
| 16 |
+
|
| 17 |
+
### NVIDIA NIM (7 Models)
|
| 18 |
+
- `nvidia_nim/qwen/qwen3-coder-480b-a35b-instruct` - Code generation
|
| 19 |
+
- `nvidia_nim/z-ai/glm4.7` - General purpose
|
| 20 |
+
- `nvidia_nim/stepfun-ai/step-3.5-flash` - Fast responses
|
| 21 |
+
- `nvidia_nim/mistralai/mistral-large-3-675b-instruct-2512` - Reasoning
|
| 22 |
+
- `nvidia_nim/abacusai/dracarys-llama-3.1-70b-instruct` - Complex tasks
|
| 23 |
+
- `nvidia_nim/bytedance/seed-oss-36b-instruct` - Balanced
|
| 24 |
+
- `nvidia_nim/mistralai/mistral-nemotron` - Thinking tasks
|
| 25 |
+
|
| 26 |
## Commands
|
| 27 |
|
| 28 |
```bash
|
|
|
|
| 48 |
```
|
| 49 |
Claude Code CLI β api/routes.py (FastAPI) β api/model_router.py β providers/* β upstream
|
| 50 |
β
|
| 51 |
+
core/chain_engine.py (fallback)
|
| 52 |
```
|
| 53 |
|
| 54 |
+
### Auto-Routing with Health Tracking
|
| 55 |
+
The proxy includes intelligent model selection:
|
| 56 |
+
1. Pre-flight health check (recent failures in 30s window)
|
| 57 |
+
2. Skip unhealthy models (3+ failures = unhealthy for 30s)
|
| 58 |
+
3. Automatic failover on timeout/rate-limit
|
| 59 |
+
4. 40 req/min rate limit respected
|
| 60 |
+
|
| 61 |
### Key Modules
|
| 62 |
|
| 63 |
+
- **api/routes.py** β FastAPI routes + REQUESTED_PROVIDER_MODELS list
|
| 64 |
+
- **api/services.py** β Request handling, fallback logic, failure recording
|
| 65 |
+
- **api/model_router.py** β Model resolution with health-aware candidate selection
|
| 66 |
+
- **api/optimization_handlers.py** β Fast-path for trivial requests
|
| 67 |
+
- **providers/rate_limit.py** β GlobalRateLimiter + ModelHealthTracker
|
| 68 |
+
- **providers/nvidia_nim/client.py** β NIM provider with fast timeouts
|
| 69 |
+
- **providers/zen/client.py** β Zen/OpenCode provider
|
| 70 |
+
- **providers/openai_compat.py** β OpenAI chat β Anthropic SSE translation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 71 |
|
| 72 |
### Provider Model Format
|
| 73 |
Model values use `provider_id/model/name` format (e.g., `nvidia_nim/z-ai/glm4.7` or `zen/minimax-m2.5-free`).
|
|
|
|
| 77 |
|
| 78 |
## Python 3.14 Notes
|
| 79 |
|
| 80 |
+
The `except X, Y:` syntax is valid in Python 3.14 (reintroduced). Do not modernize this syntax away.
|
| 81 |
|
| 82 |
## Environment Configuration
|
| 83 |
|
| 84 |
+
Key variables in `.env`:
|
| 85 |
+
- `MODEL` β Primary model (e.g., `zen/minimax-m2.5-free`)
|
| 86 |
+
- `AUTO_MODEL_PRIORITY` β Comma-separated fallback order
|
| 87 |
+
- `NVIDIA_NIM_API_KEY` β NVIDIA API key
|
| 88 |
+
- `ANTHROPIC_AUTH_TOKEN` β Auth token (any secret)
|
| 89 |
+
- `ENABLE_MODEL_THINKING` β Enable reasoning blocks
|