Spaces:

Yash030
/

claude-code-proxy

Running

App Files Files Community

claude-code-proxy / CLAUDE.md

Yash030

NIM speed optimization — adaptive rate limiting and increased throughput

aa9c0b0 about 3 hours ago

preview code

raw

history blame contribute delete

4.81 kB

	# CLAUDE.md

	This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

	## Overview

	Free Claude Code is a FastAPI proxy that routes Claude Code's Anthropic Messages API calls to backend providers (NVIDIA NIM, Zen). It translates between client-side Anthropic protocol and provider-specific transports (OpenAI chat format, native APIs), handling SSE streaming, thinking blocks, tool calls, and token usage metadata normalization.

	## Free Models

	### Zen/OpenCode (Free Tier)
	- `zen/minimax-m2.5-free` - Default, Claude Code capable
	- `zen/big-pickle` - Free tier
	- `zen/ring-2.6-1t-free` - Free tier
	- `zen/nemotron-3-super-free` - Free tier

	### NVIDIA NIM (7 Models)
	- `nvidia_nim/qwen/qwen3-coder-480b-a35b-instruct` - Code generation
	- `nvidia_nim/z-ai/glm4.7` - General purpose
	- `nvidia_nim/stepfun-ai/step-3.5-flash` - Fast responses
	- `nvidia_nim/mistralai/mistral-large-3-675b-instruct-2512` - Reasoning
	- `nvidia_nim/abacusai/dracarys-llama-3.1-70b-instruct` - Complex tasks
	- `nvidia_nim/bytedance/seed-oss-36b-instruct` - Balanced
	- `nvidia_nim/mistralai/mistral-nemotron` - Thinking tasks

	## Commands

	```bash
	uv run ruff format # Format code
	uv run ruff check # Lint code
	uv run ty check # Type check
	uv run pytest # Run tests (use -n auto for parallel)
	uv run pytest tests/path/test.py::test_name # Run single test

	# Run the proxy
	uv run uvicorn server:app --host 0.0.0.0 --port 8082

	# Or via installed scripts (after uv tool install)
	free-claude-code # Start proxy with configured host/port
	fcc-init # Create user config template at ~/.config/free-claude-code/.env
	```

	Run format → lint → type check in that order before pushing. CI enforces the same sequence.

	## Architecture

	### Request Flow
	```
	Claude Code CLI → api/routes.py (FastAPI) → api/model_router.py → providers/* → upstream
	↓
	core/chain_engine.py (fallback)
	```

	### Auto-Routing with Health Tracking
	The proxy includes intelligent model selection with per-provider health windows:
	1. Pre-flight health check (recent failures per model, window varies by provider)
	2. Skip unhealthy models (NIM: 2+ failures in 15s = unhealthy; Zen: 5+ failures in 60s = unhealthy)
	3. Automatic failover on timeout/rate-limit
	4. Zen provider is unlimited (9999 req/min scoped limiter) — never blocked by rate limits
	5. Blocked NIM providers skipped silently (no failure penalty)
	6. Load-based ordering — least-loaded providers tried first
	7. Stale sessions cleaned up every 60s on the admin dashboard

	### Key Modules

	- api/routes.py — FastAPI routes + REQUESTED_PROVIDER_MODELS list
	- api/services.py — Request handling, fallback logic, failure recording
	- api/model_router.py — Model resolution with health-aware candidate selection
	- api/optimization_handlers.py — Fast-path for trivial requests
	- api/admin.py — Admin dashboard (sessions, models, health)
	- core/session_tracker.py — Session load tracking + automatic stale session cleanup
	- providers/rate_limit.py — GlobalRateLimiter + ModelHealthTracker with per-provider health params
	- providers/nvidia_nim/client.py — NIM provider with fast timeouts
	- providers/zen/client.py — Zen/OpenCode provider
	- providers/openai_compat.py — OpenAI chat → Anthropic SSE translation

	### Provider Model Format
	Model values use `provider_id/model/name` format (e.g., `nvidia_nim/z-ai/glm4.7` or `zen/minimax-m2.5-free`).

	### Multi-Model Advertisement
	`MODEL` env var accepts comma-separated list to force the Claude CLI to display all models. Each registered model appears in the `/model` picker. Picker-safe IDs include "(no thinking)" variants that route to the same upstream model while disabling thinking blocks.

	## Python 3.14 Notes

	The `except X, Y:` syntax is valid in Python 3.14 (reintroduced). Do not modernize this syntax away.

	## Environment Configuration

	Key variables in `.env`:
	- `MODEL` — Primary model (e.g., `zen/minimax-m2.5-free`)
	- `AUTO_MODEL_ORDER` — Comma-separated fallback order for auto routing
	- `NVIDIA_NIM_API_KEY` — NVIDIA API key
	- `ANTHROPIC_AUTH_TOKEN` — Auth token (any secret)
	- `ENABLE_MODEL_THINKING` — Enable reasoning blocks

	### Session Tracking
	Start Claude Code with `--session-id <uuid>` so the admin dashboard shows accurate per-session metrics. The proxy reads the `X-Session-ID` header for session identification.

	### Admin Dashboard
	Sessions in the admin dashboard expire automatically — closed sessions are cleaned up every 60s based on activity. Stale sessions (no requests for 2x the window period) are removed automatically.