Spaces:
Running
Running
metadata
title: Claude Code Proxy
emoji: π€
colorFrom: blue
colorTo: purple
sdk: docker
sdk_version: '3.14'
python_version: '3.14'
app_file: server.py
pinned: false
The Problem
Claude Code costs $100+/month for API access. This project lets you run it using free NVIDIA NIM models instead.
The Solution
A FastAPI proxy that translates Claude Code's Anthropic API calls to NVIDIA NIM's OpenAI-compatible endpoint. Zero code changes needed in Claude Code.
βββββββββββββββββββ Anthropic API ββββββββββββββββββββ
β Claude Code β βββββββββββββββββββββββΆ β Free Claude β
β (Official) β β Code β
β β βββββββββββββββββββββββββ β Proxy β
βββββββββββββββββββ SSE Streaming β (:8082) β
ββββββββββ¬ββββββββββ
β
OpenAI Chat API
β
βΌ
ββββββββββββββββββββ
β NVIDIA NIM β
β (Free Models) β
ββββββββββββββββββββ
Features
- Drop-in replacement for Claude Code's Anthropic API
- 7 free NVIDIA NIM models available via auto-routing
- Automatic failover - switches to next model if one hits rate limit
- Multi-model support - use different models for different tasks
- Local optimizations - fast-path for common probes (saves API calls)
- Streaming - real-time response with SSE
- Tool support - Claude Code tools work with NIM models
- Thinking blocks - reasoning support where models support it
- Discord/Telegram bots - remote Claude Code sessions
- Voice notes - transcribe voice messages with Whisper
Quick Start (Cloud - No Setup)
The easiest way to use this project is on HuggingFace Spaces (free tier available).
1. Deploy to HuggingFace Spaces
Or manually:
- Go to huggingface.co/spaces/Yash030/claude-code-proxy
- Duplicate the space
- Set your secrets in the Space settings:
NVIDIA_NIM_API_KEY- Your NVIDIA API keyANTHROPIC_AUTH_TOKEN- Your auth token (any secret)
2. Get NVIDIA API Key
Get a free key at build.nvidia.com/settings/api-keys.
3. Connect Claude Code
# Use your HuggingFace Space URL (ends with .hf.space)
export ANTHROPIC_AUTH_TOKEN="your-secret-token"
export ANTHROPIC_BASE_URL="https://your-space-name.hf.space"
claude
That's it! Claude Code will use free NVIDIA NIM models.
Quick Start (Local)
1. Install Requirements
# Install Claude Code
curl -LsSf https://download.anthropic.com/install.sh | sh
# Install uv (fast Python package manager)
curl -LsSf https://astral.sh/uv/install.sh | sh
uv python install 3.14
2. Clone and Configure
git clone https://github.com/Yashwant00CR7/claude-code-nvidia.git
cd claude-code-nvidia
cp .env.example .env
Edit .env:
NVIDIA_NIM_API_KEY="nvapi-your-key"
ANTHROPIC_AUTH_TOKEN="freecc"
MODEL="nvidia_nim/z-ai/glm4.7"
3. Start Proxy
uv sync
uv run uvicorn server:app --host 0.0.0.0 --port 8082
4. Run Claude Code
export ANTHROPIC_AUTH_TOKEN="freecc"
export ANTHROPIC_BASE_URL="http://localhost:8082"
claude
Available Models
The proxy automatically routes to these models in order:
| Model | Best For | Speed |
|---|---|---|
qwen3-coder-480b |
Code generation | Fast |
glm4.7 |
General purpose | Fast |
step-3.5-flash |
Fast responses | Very Fast |
mistral-large-3 |
Reasoning | Medium |
dracarys-llama-3.1-70b |
Complex tasks | Medium |
seed-oss-36b |
Balanced | Fast |
mistral-nemotron |
Thinking tasks | Medium |
How Auto-Routing Works
When you use auto model, the proxy:
- Tries models in order of speed/reliability
- Skips rate-limited models - pre-flight check before each request
- Fast failover - if one model times out, immediately tries next
- No API waste - common probes handled locally
Request: "Write a function"
β
Check if model 1 is rate-limited? β Yes β Skip
Check if model 2 is rate-limited? β No β Try
β
Model 2 responds? β Yes β Stream response
Model 2 timeout? β Try model 3 β Success!
Environment Variables
Required
NVIDIA_NIM_API_KEY="nvapi-your-key" # From build.nvidia.com
ANTHROPIC_AUTH_TOKEN="your-secret" # Any secret you choose
Optional
MODEL="nvidia_nim/z-ai/glm4.7" # Default model
MODEL_OPUS="nvidia_nim/qwen/qwen3-..." # Model for Opus requests
MODEL_SONNET="nvidia_nim/z-ai/glm4.7" # Model for Sonnet requests
MODEL_HAIKU="nvidia_nim/z-ai/glm4.7" # Model for Haiku requests
# Auto-routing order (comma-separated)
AUTO_MODEL_PRIORITY="nvidia_nim/qwen/...,nvidia_nim/z-ai/..."
# Thinking support
ENABLE_MODEL_THINKING=true # Enable reasoning blocks
IDE Integration
VS Code Extension
Add to .vscode/settings.json:
{
"claudeCode.environmentVariables": [
{ "name": "ANTHROPIC_BASE_URL", "value": "http://localhost:8082" },
{ "name": "ANTHROPIC_AUTH_TOKEN", "value": "freecc" }
]
}
JetBrains ACP
Edit ~/.jetbrains/acp.json:
{
"env": {
"ANTHROPIC_BASE_URL": "http://localhost:8082",
"ANTHROPIC_AUTH_TOKEN": "freecc"
}
}
Remote/Ssh
For remote development, deploy to HuggingFace Spaces and use:
export ANTHROPIC_BASE_URL="https://your-space.hf.space"
Deployment Options
HuggingFace Spaces (Recommended for Cloud)
Free tier includes:
- 2 vCPU
- Community support
- Automatic HTTPS
- Git-based deployment
Setup:
- Fork the space
- Add
NVIDIA_NIM_API_KEYto Space secrets - Access at
https://your-space.hf.space
Railway (Easy Deploy)
- Connect GitHub repo
- Set environment variables
- Deploy with auto-scaling
Render (Free Tier)
- Create Web Service
- Connect GitHub
- Set build command:
uv sync - Set start command:
uv run uvicorn server:app --host 0.0.0.0 --port $PORT
Fly.io (Global Edge)
fly launch
fly secrets set NVIDIA_NIM_API_KEY="nvapi-..."
fly deploy
Local/Docker
docker build -t free-claude-code .
docker run -p 8082:8082 \
-e NVIDIA_NIM_API_KEY="nvapi-..." \
-e ANTHROPIC_AUTH_TOKEN="freecc" \
free-claude-code
Architecture
api/
βββ routes.py # FastAPI endpoints
βββ services.py # Request handling & failover
βββ model_router.py # Model resolution
βββ detection.py # Request type detection
βββ optimization_handlers.py # Fast-path responses
core/
βββ anthropic/ # SSE, token counting, tool parsing
βββ task_detector.py # Task capability detection
providers/
βββ openai_compat.py # Base OpenAI transport
βββ nvidia_nim/ # NVIDIA NIM provider
βββ rate_limit.py # Rate limiting
messaging/
βββ discord.py # Discord bot wrapper
βββ telegram.py # Telegram bot wrapper
Troubleshooting
"undefined ... input_tokens" error
- Update to latest version:
git pull - Check
ANTHROPIC_BASE_URLdoesn't end with/v1
Provider disconnects during streaming
- Reduce
PROVIDER_MAX_CONCURRENCY - Increase
HTTP_READ_TIMEOUT - Check NVIDIA NIM status at status.nvidia.com
Model not responding
- Check your NVIDIA API key is valid
- Verify rate limits haven't been hit
- Try a different model
VS Code extension shows login
- Reload the extension after setting env vars
- Confirm environment variables are set correctly
Contributing
- Fork the repo
- Create a feature branch
- Run checks:
uv run ruff format && uv run ruff check && uv run ty check - Submit PR
License
MIT License - See LICENSE