# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Overview Free Claude Code is a FastAPI proxy that routes Claude Code's Anthropic Messages API calls to backend providers (NVIDIA NIM, Zen). It translates between client-side Anthropic protocol and provider-specific transports (OpenAI chat format, native APIs), handling SSE streaming, thinking blocks, tool calls, and token usage metadata normalization. ## Free Models ### Zen/OpenCode (Free Tier) - `zen/minimax-m2.5-free` - Default, Claude Code capable - `zen/big-pickle` - Free tier - `zen/ring-2.6-1t-free` - Free tier - `zen/nemotron-3-super-free` - Free tier ### NVIDIA NIM (7 Models) - `nvidia_nim/qwen/qwen3-coder-480b-a35b-instruct` - Code generation - `nvidia_nim/z-ai/glm4.7` - General purpose - `nvidia_nim/stepfun-ai/step-3.5-flash` - Fast responses - `nvidia_nim/mistralai/mistral-large-3-675b-instruct-2512` - Reasoning - `nvidia_nim/abacusai/dracarys-llama-3.1-70b-instruct` - Complex tasks - `nvidia_nim/bytedance/seed-oss-36b-instruct` - Balanced - `nvidia_nim/mistralai/mistral-nemotron` - Thinking tasks ## Commands ```bash uv run ruff format # Format code uv run ruff check # Lint code uv run ty check # Type check uv run pytest # Run tests (use -n auto for parallel) uv run pytest tests/path/test.py::test_name # Run single test # Run the proxy uv run uvicorn server:app --host 0.0.0.0 --port 8082 # Or via installed scripts (after uv tool install) free-claude-code # Start proxy with configured host/port fcc-init # Create user config template at ~/.config/free-claude-code/.env ``` Run format → lint → type check in that order before pushing. CI enforces the same sequence. ## Architecture ### Request Flow ``` Claude Code CLI → api/routes.py (FastAPI) → api/model_router.py → providers/* → upstream ↓ core/chain_engine.py (fallback) ``` ### Auto-Routing with Health Tracking The proxy includes intelligent model selection: 1. Pre-flight health check (recent failures in 30s window per model) 2. Skip unhealthy models (3+ failures = unhealthy for 30s) 3. Automatic failover on timeout/rate-limit 4. Zen provider is unlimited (9999 req/min scoped limiter) — never blocked by rate limits 5. Blocked NIM providers skipped silently (no failure penalty) 6. Load-based ordering — least-loaded providers tried first ### Key Modules - **api/routes.py** — FastAPI routes + REQUESTED_PROVIDER_MODELS list - **api/services.py** — Request handling, fallback logic, failure recording - **api/model_router.py** — Model resolution with health-aware candidate selection - **api/optimization_handlers.py** — Fast-path for trivial requests - **providers/rate_limit.py** — GlobalRateLimiter + ModelHealthTracker - **providers/nvidia_nim/client.py** — NIM provider with fast timeouts - **providers/zen/client.py** — Zen/OpenCode provider - **providers/openai_compat.py** — OpenAI chat → Anthropic SSE translation ### Provider Model Format Model values use `provider_id/model/name` format (e.g., `nvidia_nim/z-ai/glm4.7` or `zen/minimax-m2.5-free`). ### Multi-Model Advertisement `MODEL` env var accepts comma-separated list to force the Claude CLI to display all models. Each registered model appears in the `/model` picker. Picker-safe IDs include "(no thinking)" variants that route to the same upstream model while disabling thinking blocks. ## Python 3.14 Notes The `except X, Y:` syntax is valid in Python 3.14 (reintroduced). Do not modernize this syntax away. ## Environment Configuration Key variables in `.env`: - `MODEL` — Primary model (e.g., `zen/minimax-m2.5-free`) - `AUTO_MODEL_ORDER` — Comma-separated fallback order for auto routing - `NVIDIA_NIM_API_KEY` — NVIDIA API key - `ANTHROPIC_AUTH_TOKEN` — Auth token (any secret) - `ENABLE_MODEL_THINKING` — Enable reasoning blocks ### Session Tracking Start Claude Code with `--session-id ` so the admin dashboard shows accurate per-session metrics. The proxy reads the `X-Session-ID` header for session identification.