claude-code-proxy / README.md
Yash030's picture
docs: complete README refactor with cloud deploy guide
fcc5278
metadata
title: Claude Code Proxy
emoji: πŸ€–
colorFrom: blue
colorTo: purple
sdk: docker
sdk_version: '3.14'
python_version: '3.14'
app_file: server.py
pinned: false

πŸ€– Free Claude Code

Use Claude Code with free NVIDIA NIM models through a lightweight proxy.

License: MIT Python 3.14 uv Code style: Ruff

The Problem

Claude Code costs $100+/month for API access. This project lets you run it using free NVIDIA NIM models instead.

The Solution

A FastAPI proxy that translates Claude Code's Anthropic API calls to NVIDIA NIM's OpenAI-compatible endpoint. Zero code changes needed in Claude Code.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      Anthropic API       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Claude Code   β”‚ ──────────────────────▢  β”‚  Free Claude    β”‚
β”‚   (Official)    β”‚                          β”‚     Code        β”‚
β”‚                 β”‚ ◀──────────────────────── β”‚    Proxy        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    SSE Streaming         β”‚   (:8082)       β”‚
                                          β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                   β”‚
                                          OpenAI Chat API
                                                   β”‚
                                                   β–Ό
                                          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                          β”‚   NVIDIA NIM     β”‚
                                          β”‚  (Free Models)   β”‚
                                          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Features

  • Drop-in replacement for Claude Code's Anthropic API
  • 7 free NVIDIA NIM models available via auto-routing
  • Automatic failover - switches to next model if one hits rate limit
  • Multi-model support - use different models for different tasks
  • Local optimizations - fast-path for common probes (saves API calls)
  • Streaming - real-time response with SSE
  • Tool support - Claude Code tools work with NIM models
  • Thinking blocks - reasoning support where models support it
  • Discord/Telegram bots - remote Claude Code sessions
  • Voice notes - transcribe voice messages with Whisper

Quick Start (Cloud - No Setup)

The easiest way to use this project is on HuggingFace Spaces (free tier available).

1. Deploy to HuggingFace Spaces

Deploy to HuggingFace Spaces

Or manually:

  1. Go to huggingface.co/spaces/Yash030/claude-code-proxy
  2. Duplicate the space
  3. Set your secrets in the Space settings:
    • NVIDIA_NIM_API_KEY - Your NVIDIA API key
    • ANTHROPIC_AUTH_TOKEN - Your auth token (any secret)

2. Get NVIDIA API Key

Get a free key at build.nvidia.com/settings/api-keys.

3. Connect Claude Code

# Use your HuggingFace Space URL (ends with .hf.space)
export ANTHROPIC_AUTH_TOKEN="your-secret-token"
export ANTHROPIC_BASE_URL="https://your-space-name.hf.space"
claude

That's it! Claude Code will use free NVIDIA NIM models.

Quick Start (Local)

1. Install Requirements

# Install Claude Code
curl -LsSf https://download.anthropic.com/install.sh | sh

# Install uv (fast Python package manager)
curl -LsSf https://astral.sh/uv/install.sh | sh
uv python install 3.14

2. Clone and Configure

git clone https://github.com/Yashwant00CR7/claude-code-nvidia.git
cd claude-code-nvidia
cp .env.example .env

Edit .env:

NVIDIA_NIM_API_KEY="nvapi-your-key"
ANTHROPIC_AUTH_TOKEN="freecc"
MODEL="nvidia_nim/z-ai/glm4.7"

3. Start Proxy

uv sync
uv run uvicorn server:app --host 0.0.0.0 --port 8082

4. Run Claude Code

export ANTHROPIC_AUTH_TOKEN="freecc"
export ANTHROPIC_BASE_URL="http://localhost:8082"
claude

Available Models

The proxy automatically routes to these models in order:

Model Best For Speed
qwen3-coder-480b Code generation Fast
glm4.7 General purpose Fast
step-3.5-flash Fast responses Very Fast
mistral-large-3 Reasoning Medium
dracarys-llama-3.1-70b Complex tasks Medium
seed-oss-36b Balanced Fast
mistral-nemotron Thinking tasks Medium

How Auto-Routing Works

When you use auto model, the proxy:

  1. Tries models in order of speed/reliability
  2. Skips rate-limited models - pre-flight check before each request
  3. Fast failover - if one model times out, immediately tries next
  4. No API waste - common probes handled locally
Request: "Write a function"
    ↓
Check if model 1 is rate-limited? β†’ Yes β†’ Skip
Check if model 2 is rate-limited? β†’ No β†’ Try
    ↓
Model 2 responds? β†’ Yes β†’ Stream response
Model 2 timeout? β†’ Try model 3 β†’ Success!

Environment Variables

Required

NVIDIA_NIM_API_KEY="nvapi-your-key"     # From build.nvidia.com
ANTHROPIC_AUTH_TOKEN="your-secret"     # Any secret you choose

Optional

MODEL="nvidia_nim/z-ai/glm4.7"          # Default model
MODEL_OPUS="nvidia_nim/qwen/qwen3-..."  # Model for Opus requests
MODEL_SONNET="nvidia_nim/z-ai/glm4.7"    # Model for Sonnet requests
MODEL_HAIKU="nvidia_nim/z-ai/glm4.7"    # Model for Haiku requests

# Auto-routing order (comma-separated)
AUTO_MODEL_PRIORITY="nvidia_nim/qwen/...,nvidia_nim/z-ai/..."

# Thinking support
ENABLE_MODEL_THINKING=true              # Enable reasoning blocks

IDE Integration

VS Code Extension

Add to .vscode/settings.json:

{
  "claudeCode.environmentVariables": [
    { "name": "ANTHROPIC_BASE_URL", "value": "http://localhost:8082" },
    { "name": "ANTHROPIC_AUTH_TOKEN", "value": "freecc" }
  ]
}

JetBrains ACP

Edit ~/.jetbrains/acp.json:

{
  "env": {
    "ANTHROPIC_BASE_URL": "http://localhost:8082",
    "ANTHROPIC_AUTH_TOKEN": "freecc"
  }
}

Remote/Ssh

For remote development, deploy to HuggingFace Spaces and use:

export ANTHROPIC_BASE_URL="https://your-space.hf.space"

Deployment Options

HuggingFace Spaces (Recommended for Cloud)

Free tier includes:

  • 2 vCPU
  • Community support
  • Automatic HTTPS
  • Git-based deployment

Setup:

  1. Fork the space
  2. Add NVIDIA_NIM_API_KEY to Space secrets
  3. Access at https://your-space.hf.space

Railway (Easy Deploy)

  1. Connect GitHub repo
  2. Set environment variables
  3. Deploy with auto-scaling

Render (Free Tier)

  1. Create Web Service
  2. Connect GitHub
  3. Set build command: uv sync
  4. Set start command: uv run uvicorn server:app --host 0.0.0.0 --port $PORT

Fly.io (Global Edge)

fly launch
fly secrets set NVIDIA_NIM_API_KEY="nvapi-..."
fly deploy

Local/Docker

docker build -t free-claude-code .
docker run -p 8082:8082 \
  -e NVIDIA_NIM_API_KEY="nvapi-..." \
  -e ANTHROPIC_AUTH_TOKEN="freecc" \
  free-claude-code

Architecture

api/
β”œβ”€β”€ routes.py          # FastAPI endpoints
β”œβ”€β”€ services.py       # Request handling & failover
β”œβ”€β”€ model_router.py   # Model resolution
β”œβ”€β”€ detection.py      # Request type detection
└── optimization_handlers.py  # Fast-path responses

core/
β”œβ”€β”€ anthropic/        # SSE, token counting, tool parsing
└── task_detector.py # Task capability detection

providers/
β”œβ”€β”€ openai_compat.py  # Base OpenAI transport
β”œβ”€β”€ nvidia_nim/       # NVIDIA NIM provider
└── rate_limit.py     # Rate limiting

messaging/
β”œβ”€β”€ discord.py        # Discord bot wrapper
└── telegram.py       # Telegram bot wrapper

Troubleshooting

"undefined ... input_tokens" error

  • Update to latest version: git pull
  • Check ANTHROPIC_BASE_URL doesn't end with /v1

Provider disconnects during streaming

  • Reduce PROVIDER_MAX_CONCURRENCY
  • Increase HTTP_READ_TIMEOUT
  • Check NVIDIA NIM status at status.nvidia.com

Model not responding

  • Check your NVIDIA API key is valid
  • Verify rate limits haven't been hit
  • Try a different model

VS Code extension shows login

  • Reload the extension after setting env vars
  • Confirm environment variables are set correctly

Contributing

  1. Fork the repo
  2. Create a feature branch
  3. Run checks: uv run ruff format && uv run ruff check && uv run ty check
  4. Submit PR

License

MIT License - See LICENSE

Links