Spaces:

Yash030
/

claude-code-proxy

Running

App Files Files Community

claude-code-proxy / README.md

Yash030

docs: complete README refactor with cloud deploy guide

fcc5278 about 7 hours ago

preview code

raw

history blame contribute delete

9.78 kB

metadata

title: Claude Code Proxy
emoji: 🤖
colorFrom: blue
colorTo: purple
sdk: docker
sdk_version: '3.14'
python_version: '3.14'
app_file: server.py
pinned: false

🤖 Free Claude Code

Use Claude Code with free NVIDIA NIM models through a lightweight proxy.

The Problem

Claude Code costs $100+/month for API access. This project lets you run it using free NVIDIA NIM models instead.

The Solution

A FastAPI proxy that translates Claude Code's Anthropic API calls to NVIDIA NIM's OpenAI-compatible endpoint. Zero code changes needed in Claude Code.

┌─────────────────┐      Anthropic API       ┌──────────────────┐
│   Claude Code   │ ──────────────────────▶  │  Free Claude    │
│   (Official)    │                          │     Code        │
│                 │ ◀──────────────────────── │    Proxy        │
└─────────────────┘    SSE Streaming         │   (:8082)       │
                                          └────────┬─────────┘
                                                   │
                                          OpenAI Chat API
                                                   │
                                                   ▼
                                          ┌──────────────────┐
                                          │   NVIDIA NIM     │
                                          │  (Free Models)   │
                                          └──────────────────┘

Features

Drop-in replacement for Claude Code's Anthropic API
7 free NVIDIA NIM models available via auto-routing
Automatic failover - switches to next model if one hits rate limit
Multi-model support - use different models for different tasks
Local optimizations - fast-path for common probes (saves API calls)
Streaming - real-time response with SSE
Tool support - Claude Code tools work with NIM models
Thinking blocks - reasoning support where models support it
Discord/Telegram bots - remote Claude Code sessions
Voice notes - transcribe voice messages with Whisper

Quick Start (Cloud - No Setup)

The easiest way to use this project is on HuggingFace Spaces (free tier available).

1. Deploy to HuggingFace Spaces

Or manually:

Go to huggingface.co/spaces/Yash030/claude-code-proxy
Duplicate the space
Set your secrets in the Space settings:
- NVIDIA_NIM_API_KEY - Your NVIDIA API key
- ANTHROPIC_AUTH_TOKEN - Your auth token (any secret)

2. Get NVIDIA API Key

Get a free key at build.nvidia.com/settings/api-keys.

3. Connect Claude Code

# Use your HuggingFace Space URL (ends with .hf.space)
export ANTHROPIC_AUTH_TOKEN="your-secret-token"
export ANTHROPIC_BASE_URL="https://your-space-name.hf.space"
claude

That's it! Claude Code will use free NVIDIA NIM models.

Quick Start (Local)

1. Install Requirements

# Install Claude Code
curl -LsSf https://download.anthropic.com/install.sh | sh

# Install uv (fast Python package manager)
curl -LsSf https://astral.sh/uv/install.sh | sh
uv python install 3.14

2. Clone and Configure

git clone https://github.com/Yashwant00CR7/claude-code-nvidia.git
cd claude-code-nvidia
cp .env.example .env

Edit .env:

NVIDIA_NIM_API_KEY="nvapi-your-key"
ANTHROPIC_AUTH_TOKEN="freecc"
MODEL="nvidia_nim/z-ai/glm4.7"

3. Start Proxy

uv sync
uv run uvicorn server:app --host 0.0.0.0 --port 8082

4. Run Claude Code

export ANTHROPIC_AUTH_TOKEN="freecc"
export ANTHROPIC_BASE_URL="http://localhost:8082"
claude

Available Models

The proxy automatically routes to these models in order:

Model	Best For	Speed
`qwen3-coder-480b`	Code generation	Fast
`glm4.7`	General purpose	Fast
`step-3.5-flash`	Fast responses	Very Fast
`mistral-large-3`	Reasoning	Medium
`dracarys-llama-3.1-70b`	Complex tasks	Medium
`seed-oss-36b`	Balanced	Fast
`mistral-nemotron`	Thinking tasks	Medium

How Auto-Routing Works

When you use auto model, the proxy:

Tries models in order of speed/reliability
Skips rate-limited models - pre-flight check before each request
Fast failover - if one model times out, immediately tries next
No API waste - common probes handled locally

Request: "Write a function"
    ↓
Check if model 1 is rate-limited? → Yes → Skip
Check if model 2 is rate-limited? → No → Try
    ↓
Model 2 responds? → Yes → Stream response
Model 2 timeout? → Try model 3 → Success!

Environment Variables

Required

NVIDIA_NIM_API_KEY="nvapi-your-key"     # From build.nvidia.com
ANTHROPIC_AUTH_TOKEN="your-secret"     # Any secret you choose

Optional

MODEL="nvidia_nim/z-ai/glm4.7"          # Default model
MODEL_OPUS="nvidia_nim/qwen/qwen3-..."  # Model for Opus requests
MODEL_SONNET="nvidia_nim/z-ai/glm4.7"    # Model for Sonnet requests
MODEL_HAIKU="nvidia_nim/z-ai/glm4.7"    # Model for Haiku requests

# Auto-routing order (comma-separated)
AUTO_MODEL_PRIORITY="nvidia_nim/qwen/...,nvidia_nim/z-ai/..."

# Thinking support
ENABLE_MODEL_THINKING=true              # Enable reasoning blocks

IDE Integration

VS Code Extension

Add to .vscode/settings.json:

{
  "claudeCode.environmentVariables": [
    { "name": "ANTHROPIC_BASE_URL", "value": "http://localhost:8082" },
    { "name": "ANTHROPIC_AUTH_TOKEN", "value": "freecc" }
  ]
}

JetBrains ACP

Edit ~/.jetbrains/acp.json:

{
  "env": {
    "ANTHROPIC_BASE_URL": "http://localhost:8082",
    "ANTHROPIC_AUTH_TOKEN": "freecc"
  }
}

Remote/Ssh

For remote development, deploy to HuggingFace Spaces and use:

export ANTHROPIC_BASE_URL="https://your-space.hf.space"

Deployment Options

HuggingFace Spaces (Recommended for Cloud)

Free tier includes:

2 vCPU
Community support
Automatic HTTPS
Git-based deployment

Setup:

Fork the space
Add NVIDIA_NIM_API_KEY to Space secrets
Access at https://your-space.hf.space

Railway (Easy Deploy)

Connect GitHub repo
Set environment variables
Deploy with auto-scaling

Render (Free Tier)

Create Web Service
Connect GitHub
Set build command: uv sync
Set start command: uv run uvicorn server:app --host 0.0.0.0 --port $PORT

Fly.io (Global Edge)

fly launch
fly secrets set NVIDIA_NIM_API_KEY="nvapi-..."
fly deploy

Local/Docker

docker build -t free-claude-code .
docker run -p 8082:8082 \
  -e NVIDIA_NIM_API_KEY="nvapi-..." \
  -e ANTHROPIC_AUTH_TOKEN="freecc" \
  free-claude-code

Architecture

api/
├── routes.py          # FastAPI endpoints
├── services.py       # Request handling & failover
├── model_router.py   # Model resolution
├── detection.py      # Request type detection
└── optimization_handlers.py  # Fast-path responses

core/
├── anthropic/        # SSE, token counting, tool parsing
└── task_detector.py # Task capability detection

providers/
├── openai_compat.py  # Base OpenAI transport
├── nvidia_nim/       # NVIDIA NIM provider
└── rate_limit.py     # Rate limiting

messaging/
├── discord.py        # Discord bot wrapper
└── telegram.py       # Telegram bot wrapper

Troubleshooting

"undefined ... input_tokens" error

Update to latest version: git pull
Check ANTHROPIC_BASE_URL doesn't end with /v1

Provider disconnects during streaming

Reduce PROVIDER_MAX_CONCURRENCY
Increase HTTP_READ_TIMEOUT
Check NVIDIA NIM status at status.nvidia.com

Model not responding

Check your NVIDIA API key is valid
Verify rate limits haven't been hit
Try a different model

VS Code extension shows login

Reload the extension after setting env vars
Confirm environment variables are set correctly

Contributing

Fork the repo
Create a feature branch
Run checks: uv run ruff format && uv run ruff check && uv run ty check
Submit PR

License

MIT License - See LICENSE