File size: 4,806 Bytes
0015069
 
 
 
 
 
 
 
d64f2a2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0015069
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d64f2a2
0015069
 
d64f2a2
aa9c0b0
 
 
d64f2a2
84a115b
 
 
aa9c0b0
d64f2a2
0015069
 
d64f2a2
 
 
 
aa9c0b0
 
 
d64f2a2
 
 
0015069
 
 
 
 
 
 
 
 
d64f2a2
0015069
 
 
d64f2a2
 
84a115b
d64f2a2
 
84a115b
 
 
aa9c0b0
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Overview

Free Claude Code is a FastAPI proxy that routes Claude Code's Anthropic Messages API calls to backend providers (NVIDIA NIM, Zen). It translates between client-side Anthropic protocol and provider-specific transports (OpenAI chat format, native APIs), handling SSE streaming, thinking blocks, tool calls, and token usage metadata normalization.

## Free Models

### Zen/OpenCode (Free Tier)
- `zen/minimax-m2.5-free` - Default, Claude Code capable
- `zen/big-pickle` - Free tier
- `zen/ring-2.6-1t-free` - Free tier
- `zen/nemotron-3-super-free` - Free tier

### NVIDIA NIM (7 Models)
- `nvidia_nim/qwen/qwen3-coder-480b-a35b-instruct` - Code generation
- `nvidia_nim/z-ai/glm4.7` - General purpose
- `nvidia_nim/stepfun-ai/step-3.5-flash` - Fast responses
- `nvidia_nim/mistralai/mistral-large-3-675b-instruct-2512` - Reasoning
- `nvidia_nim/abacusai/dracarys-llama-3.1-70b-instruct` - Complex tasks
- `nvidia_nim/bytedance/seed-oss-36b-instruct` - Balanced
- `nvidia_nim/mistralai/mistral-nemotron` - Thinking tasks

## Commands

```bash
uv run ruff format          # Format code
uv run ruff check           # Lint code
uv run ty check             # Type check
uv run pytest               # Run tests (use -n auto for parallel)
uv run pytest tests/path/test.py::test_name  # Run single test

# Run the proxy
uv run uvicorn server:app --host 0.0.0.0 --port 8082

# Or via installed scripts (after uv tool install)
free-claude-code            # Start proxy with configured host/port
fcc-init                    # Create user config template at ~/.config/free-claude-code/.env
```

Run format β†’ lint β†’ type check in that order before pushing. CI enforces the same sequence.

## Architecture

### Request Flow
```
Claude Code CLI β†’ api/routes.py (FastAPI) β†’ api/model_router.py β†’ providers/* β†’ upstream
                                                    ↓
                                            core/chain_engine.py (fallback)
```

### Auto-Routing with Health Tracking
The proxy includes intelligent model selection with per-provider health windows:
1. Pre-flight health check (recent failures per model, window varies by provider)
2. Skip unhealthy models (NIM: 2+ failures in 15s = unhealthy; Zen: 5+ failures in 60s = unhealthy)
3. Automatic failover on timeout/rate-limit
4. Zen provider is unlimited (9999 req/min scoped limiter) β€” never blocked by rate limits
5. Blocked NIM providers skipped silently (no failure penalty)
6. Load-based ordering β€” least-loaded providers tried first
7. Stale sessions cleaned up every 60s on the admin dashboard

### Key Modules

- **api/routes.py** β€” FastAPI routes + REQUESTED_PROVIDER_MODELS list
- **api/services.py** β€” Request handling, fallback logic, failure recording
- **api/model_router.py** β€” Model resolution with health-aware candidate selection
- **api/optimization_handlers.py** β€” Fast-path for trivial requests
- **api/admin.py** β€” Admin dashboard (sessions, models, health)
- **core/session_tracker.py** β€” Session load tracking + automatic stale session cleanup
- **providers/rate_limit.py** β€” GlobalRateLimiter + ModelHealthTracker with per-provider health params
- **providers/nvidia_nim/client.py** β€” NIM provider with fast timeouts
- **providers/zen/client.py** β€” Zen/OpenCode provider
- **providers/openai_compat.py** β€” OpenAI chat β†’ Anthropic SSE translation

### Provider Model Format
Model values use `provider_id/model/name` format (e.g., `nvidia_nim/z-ai/glm4.7` or `zen/minimax-m2.5-free`).

### Multi-Model Advertisement
`MODEL` env var accepts comma-separated list to force the Claude CLI to display all models. Each registered model appears in the `/model` picker. Picker-safe IDs include "(no thinking)" variants that route to the same upstream model while disabling thinking blocks.

## Python 3.14 Notes

The `except X, Y:` syntax is valid in Python 3.14 (reintroduced). Do not modernize this syntax away.

## Environment Configuration

Key variables in `.env`:
- `MODEL` β€” Primary model (e.g., `zen/minimax-m2.5-free`)
- `AUTO_MODEL_ORDER` β€” Comma-separated fallback order for auto routing
- `NVIDIA_NIM_API_KEY` β€” NVIDIA API key
- `ANTHROPIC_AUTH_TOKEN` β€” Auth token (any secret)
- `ENABLE_MODEL_THINKING` β€” Enable reasoning blocks

### Session Tracking
Start Claude Code with `--session-id <uuid>` so the admin dashboard shows accurate per-session metrics. The proxy reads the `X-Session-ID` header for session identification.

### Admin Dashboard
Sessions in the admin dashboard expire automatically β€” closed sessions are cleaned up every 60s based on activity. Stale sessions (no requests for 2x the window period) are removed automatically.