Spaces:
Running
Running
| # CLAUDE.md | |
| This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. | |
| ## Overview | |
| Free Claude Code is a FastAPI proxy that routes Claude Code's Anthropic Messages API calls to backend providers (NVIDIA NIM, Zen). It translates between client-side Anthropic protocol and provider-specific transports (OpenAI chat format, native APIs), handling SSE streaming, thinking blocks, tool calls, and token usage metadata normalization. | |
| ## Free Models | |
| ### Zen/OpenCode (Free Tier) | |
| - `zen/minimax-m2.5-free` - Default, Claude Code capable | |
| - `zen/big-pickle` - Free tier | |
| - `zen/ring-2.6-1t-free` - Free tier | |
| - `zen/nemotron-3-super-free` - Free tier | |
| ### NVIDIA NIM (7 Models) | |
| - `nvidia_nim/qwen/qwen3-coder-480b-a35b-instruct` - Code generation | |
| - `nvidia_nim/z-ai/glm4.7` - General purpose | |
| - `nvidia_nim/stepfun-ai/step-3.5-flash` - Fast responses | |
| - `nvidia_nim/mistralai/mistral-large-3-675b-instruct-2512` - Reasoning | |
| - `nvidia_nim/abacusai/dracarys-llama-3.1-70b-instruct` - Complex tasks | |
| - `nvidia_nim/bytedance/seed-oss-36b-instruct` - Balanced | |
| - `nvidia_nim/mistralai/mistral-nemotron` - Thinking tasks | |
| ## Commands | |
| ```bash | |
| uv run ruff format # Format code | |
| uv run ruff check # Lint code | |
| uv run ty check # Type check | |
| uv run pytest # Run tests (use -n auto for parallel) | |
| uv run pytest tests/path/test.py::test_name # Run single test | |
| # Run the proxy | |
| uv run uvicorn server:app --host 0.0.0.0 --port 8082 | |
| # Or via installed scripts (after uv tool install) | |
| free-claude-code # Start proxy with configured host/port | |
| fcc-init # Create user config template at ~/.config/free-claude-code/.env | |
| ``` | |
| Run format β lint β type check in that order before pushing. CI enforces the same sequence. | |
| ## Architecture | |
| ### Request Flow | |
| ``` | |
| Claude Code CLI β api/routes.py (FastAPI) β api/model_router.py β providers/* β upstream | |
| β | |
| core/chain_engine.py (fallback) | |
| ``` | |
| ### Auto-Routing with Health Tracking | |
| The proxy includes intelligent model selection with per-provider health windows: | |
| 1. Pre-flight health check (recent failures per model, window varies by provider) | |
| 2. Skip unhealthy models (NIM: 2+ failures in 15s = unhealthy; Zen: 5+ failures in 60s = unhealthy) | |
| 3. Automatic failover on timeout/rate-limit | |
| 4. Zen provider is unlimited (9999 req/min scoped limiter) β never blocked by rate limits | |
| 5. Blocked NIM providers skipped silently (no failure penalty) | |
| 6. Load-based ordering β least-loaded providers tried first | |
| 7. Stale sessions cleaned up every 60s on the admin dashboard | |
| ### Key Modules | |
| - **api/routes.py** β FastAPI routes + REQUESTED_PROVIDER_MODELS list | |
| - **api/services.py** β Request handling, fallback logic, failure recording | |
| - **api/model_router.py** β Model resolution with health-aware candidate selection | |
| - **api/optimization_handlers.py** β Fast-path for trivial requests | |
| - **api/admin.py** β Admin dashboard (sessions, models, health) | |
| - **core/session_tracker.py** β Session load tracking + automatic stale session cleanup | |
| - **providers/rate_limit.py** β GlobalRateLimiter + ModelHealthTracker with per-provider health params | |
| - **providers/nvidia_nim/client.py** β NIM provider with fast timeouts | |
| - **providers/zen/client.py** β Zen/OpenCode provider | |
| - **providers/openai_compat.py** β OpenAI chat β Anthropic SSE translation | |
| ### Provider Model Format | |
| Model values use `provider_id/model/name` format (e.g., `nvidia_nim/z-ai/glm4.7` or `zen/minimax-m2.5-free`). | |
| ### Multi-Model Advertisement | |
| `MODEL` env var accepts comma-separated list to force the Claude CLI to display all models. Each registered model appears in the `/model` picker. Picker-safe IDs include "(no thinking)" variants that route to the same upstream model while disabling thinking blocks. | |
| ## Python 3.14 Notes | |
| The `except X, Y:` syntax is valid in Python 3.14 (reintroduced). Do not modernize this syntax away. | |
| ## Environment Configuration | |
| Key variables in `.env`: | |
| - `MODEL` β Primary model (e.g., `zen/minimax-m2.5-free`) | |
| - `AUTO_MODEL_ORDER` β Comma-separated fallback order for auto routing | |
| - `NVIDIA_NIM_API_KEY` β NVIDIA API key | |
| - `ANTHROPIC_AUTH_TOKEN` β Auth token (any secret) | |
| - `ENABLE_MODEL_THINKING` β Enable reasoning blocks | |
| ### Session Tracking | |
| Start Claude Code with `--session-id <uuid>` so the admin dashboard shows accurate per-session metrics. The proxy reads the `X-Session-ID` header for session identification. | |
| ### Admin Dashboard | |
| Sessions in the admin dashboard expire automatically β closed sessions are cleaned up every 60s based on activity. Stale sessions (no requests for 2x the window period) are removed automatically. |