---
title: Claude Code Proxy
emoji: 🤖
colorFrom: blue
colorTo: purple
sdk: docker
sdk_version: "3.14"
python_version: "3.14"
app_file: server.py
pinned: false
---
# 🤖 Free Claude Code
**Use Claude Code with free NVIDIA NIM models through a lightweight proxy.**
[](https://opensource.org/licenses/MIT)
[](https://www.python.org/downloads/)
[](https://github.com/astral-sh/uv)
[](https://github.com/astral-sh/ruff)
## The Problem
Claude Code costs $100+/month for API access. This project lets you run it using **free NVIDIA NIM models** instead.
## The Solution
A FastAPI proxy that translates Claude Code's Anthropic API calls to NVIDIA NIM's OpenAI-compatible endpoint. Zero code changes needed in Claude Code.
```
┌─────────────────┐ Anthropic API ┌──────────────────┐
│ Claude Code │ ──────────────────────▶ │ Free Claude │
│ (Official) │ │ Code │
│ │ ◀──────────────────────── │ Proxy │
└─────────────────┘ SSE Streaming │ (:8082) │
└────────┬─────────┘
│
OpenAI Chat API
│
▼
┌──────────────────┐
│ NVIDIA NIM │
│ (Free Models) │
└──────────────────┘
```
## Features
- **Drop-in replacement** for Claude Code's Anthropic API
- **7 free NVIDIA NIM models** available via auto-routing
- **Automatic failover** - switches to next model if one hits rate limit
- **Multi-model support** - use different models for different tasks
- **Local optimizations** - fast-path for common probes (saves API calls)
- **Streaming** - real-time response with SSE
- **Tool support** - Claude Code tools work with NIM models
- **Thinking blocks** - reasoning support where models support it
- **Discord/Telegram bots** - remote Claude Code sessions
- **Voice notes** - transcribe voice messages with Whisper
## Quick Start (Cloud - No Setup)
The easiest way to use this project is on **HuggingFace Spaces** (free tier available).
### 1. Deploy to HuggingFace Spaces
Or manually:
1. Go to [huggingface.co/spaces/Yash030/claude-code-proxy](https://huggingface.co/spaces/Yash030/claude-code-proxy)
2. Duplicate the space
3. Set your secrets in the Space settings:
- `NVIDIA_NIM_API_KEY` - Your NVIDIA API key
- `ANTHROPIC_AUTH_TOKEN` - Your auth token (any secret)
### 2. Get NVIDIA API Key
Get a free key at [build.nvidia.com/settings/api-keys](https://build.nvidia.com/settings/api-keys).
### 3. Connect Claude Code
```bash
# Use your HuggingFace Space URL (ends with .hf.space)
export ANTHROPIC_AUTH_TOKEN="your-secret-token"
export ANTHROPIC_BASE_URL="https://your-space-name.hf.space"
claude
```
That's it! Claude Code will use free NVIDIA NIM models.
## Quick Start (Local)
### 1. Install Requirements
```bash
# Install Claude Code
curl -LsSf https://download.anthropic.com/install.sh | sh
# Install uv (fast Python package manager)
curl -LsSf https://astral.sh/uv/install.sh | sh
uv python install 3.14
```
### 2. Clone and Configure
```bash
git clone https://github.com/Yashwant00CR7/claude-code-nvidia.git
cd claude-code-nvidia
cp .env.example .env
```
Edit `.env`:
```dotenv
NVIDIA_NIM_API_KEY="nvapi-your-key"
ANTHROPIC_AUTH_TOKEN="freecc"
MODEL="nvidia_nim/z-ai/glm4.7"
```
### 3. Start Proxy
```bash
uv sync
uv run uvicorn server:app --host 0.0.0.0 --port 8082
```
### 4. Run Claude Code
```bash
export ANTHROPIC_AUTH_TOKEN="freecc"
export ANTHROPIC_BASE_URL="http://localhost:8082"
claude
```
## Available Models
The proxy automatically routes to these models in order:
| Model | Best For | Speed |
|-------|----------|-------|
| `qwen3-coder-480b` | Code generation | Fast |
| `glm4.7` | General purpose | Fast |
| `step-3.5-flash` | Fast responses | Very Fast |
| `mistral-large-3` | Reasoning | Medium |
| `dracarys-llama-3.1-70b` | Complex tasks | Medium |
| `seed-oss-36b` | Balanced | Fast |
| `mistral-nemotron` | Thinking tasks | Medium |
## How Auto-Routing Works
When you use `auto` model, the proxy:
1. **Tries models in order** of speed/reliability
2. **Skips rate-limited models** - pre-flight check before each request
3. **Fast failover** - if one model times out, immediately tries next
4. **No API waste** - common probes handled locally
```
Request: "Write a function"
↓
Check if model 1 is rate-limited? → Yes → Skip
Check if model 2 is rate-limited? → No → Try
↓
Model 2 responds? → Yes → Stream response
Model 2 timeout? → Try model 3 → Success!
```
## Environment Variables
### Required
```dotenv
NVIDIA_NIM_API_KEY="nvapi-your-key" # From build.nvidia.com
ANTHROPIC_AUTH_TOKEN="your-secret" # Any secret you choose
```
### Optional
```dotenv
MODEL="nvidia_nim/z-ai/glm4.7" # Default model
MODEL_OPUS="nvidia_nim/qwen/qwen3-..." # Model for Opus requests
MODEL_SONNET="nvidia_nim/z-ai/glm4.7" # Model for Sonnet requests
MODEL_HAIKU="nvidia_nim/z-ai/glm4.7" # Model for Haiku requests
# Auto-routing order (comma-separated)
AUTO_MODEL_PRIORITY="nvidia_nim/qwen/...,nvidia_nim/z-ai/..."
# Thinking support
ENABLE_MODEL_THINKING=true # Enable reasoning blocks
```
## IDE Integration
### VS Code Extension
Add to `.vscode/settings.json`:
```json
{
"claudeCode.environmentVariables": [
{ "name": "ANTHROPIC_BASE_URL", "value": "http://localhost:8082" },
{ "name": "ANTHROPIC_AUTH_TOKEN", "value": "freecc" }
]
}
```
### JetBrains ACP
Edit `~/.jetbrains/acp.json`:
```json
{
"env": {
"ANTHROPIC_BASE_URL": "http://localhost:8082",
"ANTHROPIC_AUTH_TOKEN": "freecc"
}
}
```
### Remote/Ssh
For remote development, deploy to HuggingFace Spaces and use:
```bash
export ANTHROPIC_BASE_URL="https://your-space.hf.space"
```
## Deployment Options
### HuggingFace Spaces (Recommended for Cloud)
**Free tier includes:**
- 2 vCPU
- Community support
- Automatic HTTPS
- Git-based deployment
**Setup:**
1. Fork [the space](https://huggingface.co/spaces/Yash030/claude-code-proxy)
2. Add `NVIDIA_NIM_API_KEY` to Space secrets
3. Access at `https://your-space.hf.space`
### Railway (Easy Deploy)
1. Connect GitHub repo
2. Set environment variables
3. Deploy with auto-scaling
### Render (Free Tier)
1. Create Web Service
2. Connect GitHub
3. Set build command: `uv sync`
4. Set start command: `uv run uvicorn server:app --host 0.0.0.0 --port $PORT`
### Fly.io (Global Edge)
```bash
fly launch
fly secrets set NVIDIA_NIM_API_KEY="nvapi-..."
fly deploy
```
### Local/Docker
```bash
docker build -t free-claude-code .
docker run -p 8082:8082 \
-e NVIDIA_NIM_API_KEY="nvapi-..." \
-e ANTHROPIC_AUTH_TOKEN="freecc" \
free-claude-code
```
## Architecture
```
api/
├── routes.py # FastAPI endpoints
├── services.py # Request handling & failover
├── model_router.py # Model resolution
├── detection.py # Request type detection
└── optimization_handlers.py # Fast-path responses
core/
├── anthropic/ # SSE, token counting, tool parsing
└── task_detector.py # Task capability detection
providers/
├── openai_compat.py # Base OpenAI transport
├── nvidia_nim/ # NVIDIA NIM provider
└── rate_limit.py # Rate limiting
messaging/
├── discord.py # Discord bot wrapper
└── telegram.py # Telegram bot wrapper
```
## Troubleshooting
### "undefined ... input_tokens" error
- Update to latest version: `git pull`
- Check `ANTHROPIC_BASE_URL` doesn't end with `/v1`
### Provider disconnects during streaming
- Reduce `PROVIDER_MAX_CONCURRENCY`
- Increase `HTTP_READ_TIMEOUT`
- Check NVIDIA NIM status at [status.nvidia.com](https://status.nvidia.com)
### Model not responding
- Check your NVIDIA API key is valid
- Verify rate limits haven't been hit
- Try a different model
### VS Code extension shows login
- Reload the extension after setting env vars
- Confirm environment variables are set correctly
## Contributing
1. Fork the repo
2. Create a feature branch
3. Run checks: `uv run ruff format && uv run ruff check && uv run ty check`
4. Submit PR
## License
MIT License - See [LICENSE](LICENSE)
## Links
- [GitHub](https://github.com/Yashwant00CR7/claude-code-nvidia)
- [HuggingFace Space](https://huggingface.co/spaces/Yash030/claude-code-proxy)
- [NVIDIA NIM](https://build.nvidia.com)
- [Claude Code](https://github.com/anthropics/claude-code)