claude-code-proxy / README.md
Yash030's picture
docs: complete README refactor with cloud deploy guide
fcc5278
---
title: Claude Code Proxy
emoji: πŸ€–
colorFrom: blue
colorTo: purple
sdk: docker
sdk_version: "3.14"
python_version: "3.14"
app_file: server.py
pinned: false
---
<div align="center">
# πŸ€– Free Claude Code
**Use Claude Code with free NVIDIA NIM models through a lightweight proxy.**
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg?style=for-the-badge)](https://opensource.org/licenses/MIT)
[![Python 3.14](https://img.shields.io/badge/python-3.14-3776ab.svg?style=for-the-badge&logo=python&logoColor=white)](https://www.python.org/downloads/)
[![uv](https://img.shields.io/badge/uv-spawn-ffc21c.svg?style=for-the-badge)](https://github.com/astral-sh/uv)
[![Code style: Ruff](https://img.shields.io/badge/code%20formatting-ruff-f5a623.svg?style=for-the-badge)](https://github.com/astral-sh/ruff)
</div>
## The Problem
Claude Code costs $100+/month for API access. This project lets you run it using **free NVIDIA NIM models** instead.
## The Solution
A FastAPI proxy that translates Claude Code's Anthropic API calls to NVIDIA NIM's OpenAI-compatible endpoint. Zero code changes needed in Claude Code.
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” Anthropic API β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Claude Code β”‚ ──────────────────────▢ β”‚ Free Claude β”‚
β”‚ (Official) β”‚ β”‚ Code β”‚
β”‚ β”‚ ◀──────────────────────── β”‚ Proxy β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ SSE Streaming β”‚ (:8082) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
OpenAI Chat API
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ NVIDIA NIM β”‚
β”‚ (Free Models) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
## Features
- **Drop-in replacement** for Claude Code's Anthropic API
- **7 free NVIDIA NIM models** available via auto-routing
- **Automatic failover** - switches to next model if one hits rate limit
- **Multi-model support** - use different models for different tasks
- **Local optimizations** - fast-path for common probes (saves API calls)
- **Streaming** - real-time response with SSE
- **Tool support** - Claude Code tools work with NIM models
- **Thinking blocks** - reasoning support where models support it
- **Discord/Telegram bots** - remote Claude Code sessions
- **Voice notes** - transcribe voice messages with Whisper
## Quick Start (Cloud - No Setup)
The easiest way to use this project is on **HuggingFace Spaces** (free tier available).
### 1. Deploy to HuggingFace Spaces
<a target="_blank" href="https://huggingface.co/new-space?template=Yash030/claude-code-proxy">
<img src="https://huggingface.co/datasets/huggingface/badges/raw/main/deploy-to-spaces-lg.svg" alt="Deploy to HuggingFace Spaces"/>
</a>
Or manually:
1. Go to [huggingface.co/spaces/Yash030/claude-code-proxy](https://huggingface.co/spaces/Yash030/claude-code-proxy)
2. Duplicate the space
3. Set your secrets in the Space settings:
- `NVIDIA_NIM_API_KEY` - Your NVIDIA API key
- `ANTHROPIC_AUTH_TOKEN` - Your auth token (any secret)
### 2. Get NVIDIA API Key
Get a free key at [build.nvidia.com/settings/api-keys](https://build.nvidia.com/settings/api-keys).
### 3. Connect Claude Code
```bash
# Use your HuggingFace Space URL (ends with .hf.space)
export ANTHROPIC_AUTH_TOKEN="your-secret-token"
export ANTHROPIC_BASE_URL="https://your-space-name.hf.space"
claude
```
That's it! Claude Code will use free NVIDIA NIM models.
## Quick Start (Local)
### 1. Install Requirements
```bash
# Install Claude Code
curl -LsSf https://download.anthropic.com/install.sh | sh
# Install uv (fast Python package manager)
curl -LsSf https://astral.sh/uv/install.sh | sh
uv python install 3.14
```
### 2. Clone and Configure
```bash
git clone https://github.com/Yashwant00CR7/claude-code-nvidia.git
cd claude-code-nvidia
cp .env.example .env
```
Edit `.env`:
```dotenv
NVIDIA_NIM_API_KEY="nvapi-your-key"
ANTHROPIC_AUTH_TOKEN="freecc"
MODEL="nvidia_nim/z-ai/glm4.7"
```
### 3. Start Proxy
```bash
uv sync
uv run uvicorn server:app --host 0.0.0.0 --port 8082
```
### 4. Run Claude Code
```bash
export ANTHROPIC_AUTH_TOKEN="freecc"
export ANTHROPIC_BASE_URL="http://localhost:8082"
claude
```
## Available Models
The proxy automatically routes to these models in order:
| Model | Best For | Speed |
|-------|----------|-------|
| `qwen3-coder-480b` | Code generation | Fast |
| `glm4.7` | General purpose | Fast |
| `step-3.5-flash` | Fast responses | Very Fast |
| `mistral-large-3` | Reasoning | Medium |
| `dracarys-llama-3.1-70b` | Complex tasks | Medium |
| `seed-oss-36b` | Balanced | Fast |
| `mistral-nemotron` | Thinking tasks | Medium |
## How Auto-Routing Works
When you use `auto` model, the proxy:
1. **Tries models in order** of speed/reliability
2. **Skips rate-limited models** - pre-flight check before each request
3. **Fast failover** - if one model times out, immediately tries next
4. **No API waste** - common probes handled locally
```
Request: "Write a function"
↓
Check if model 1 is rate-limited? β†’ Yes β†’ Skip
Check if model 2 is rate-limited? β†’ No β†’ Try
↓
Model 2 responds? β†’ Yes β†’ Stream response
Model 2 timeout? β†’ Try model 3 β†’ Success!
```
## Environment Variables
### Required
```dotenv
NVIDIA_NIM_API_KEY="nvapi-your-key" # From build.nvidia.com
ANTHROPIC_AUTH_TOKEN="your-secret" # Any secret you choose
```
### Optional
```dotenv
MODEL="nvidia_nim/z-ai/glm4.7" # Default model
MODEL_OPUS="nvidia_nim/qwen/qwen3-..." # Model for Opus requests
MODEL_SONNET="nvidia_nim/z-ai/glm4.7" # Model for Sonnet requests
MODEL_HAIKU="nvidia_nim/z-ai/glm4.7" # Model for Haiku requests
# Auto-routing order (comma-separated)
AUTO_MODEL_PRIORITY="nvidia_nim/qwen/...,nvidia_nim/z-ai/..."
# Thinking support
ENABLE_MODEL_THINKING=true # Enable reasoning blocks
```
## IDE Integration
### VS Code Extension
Add to `.vscode/settings.json`:
```json
{
"claudeCode.environmentVariables": [
{ "name": "ANTHROPIC_BASE_URL", "value": "http://localhost:8082" },
{ "name": "ANTHROPIC_AUTH_TOKEN", "value": "freecc" }
]
}
```
### JetBrains ACP
Edit `~/.jetbrains/acp.json`:
```json
{
"env": {
"ANTHROPIC_BASE_URL": "http://localhost:8082",
"ANTHROPIC_AUTH_TOKEN": "freecc"
}
}
```
### Remote/Ssh
For remote development, deploy to HuggingFace Spaces and use:
```bash
export ANTHROPIC_BASE_URL="https://your-space.hf.space"
```
## Deployment Options
### HuggingFace Spaces (Recommended for Cloud)
**Free tier includes:**
- 2 vCPU
- Community support
- Automatic HTTPS
- Git-based deployment
**Setup:**
1. Fork [the space](https://huggingface.co/spaces/Yash030/claude-code-proxy)
2. Add `NVIDIA_NIM_API_KEY` to Space secrets
3. Access at `https://your-space.hf.space`
### Railway (Easy Deploy)
1. Connect GitHub repo
2. Set environment variables
3. Deploy with auto-scaling
### Render (Free Tier)
1. Create Web Service
2. Connect GitHub
3. Set build command: `uv sync`
4. Set start command: `uv run uvicorn server:app --host 0.0.0.0 --port $PORT`
### Fly.io (Global Edge)
```bash
fly launch
fly secrets set NVIDIA_NIM_API_KEY="nvapi-..."
fly deploy
```
### Local/Docker
```bash
docker build -t free-claude-code .
docker run -p 8082:8082 \
-e NVIDIA_NIM_API_KEY="nvapi-..." \
-e ANTHROPIC_AUTH_TOKEN="freecc" \
free-claude-code
```
## Architecture
```
api/
β”œβ”€β”€ routes.py # FastAPI endpoints
β”œβ”€β”€ services.py # Request handling & failover
β”œβ”€β”€ model_router.py # Model resolution
β”œβ”€β”€ detection.py # Request type detection
└── optimization_handlers.py # Fast-path responses
core/
β”œβ”€β”€ anthropic/ # SSE, token counting, tool parsing
└── task_detector.py # Task capability detection
providers/
β”œβ”€β”€ openai_compat.py # Base OpenAI transport
β”œβ”€β”€ nvidia_nim/ # NVIDIA NIM provider
└── rate_limit.py # Rate limiting
messaging/
β”œβ”€β”€ discord.py # Discord bot wrapper
└── telegram.py # Telegram bot wrapper
```
## Troubleshooting
### "undefined ... input_tokens" error
- Update to latest version: `git pull`
- Check `ANTHROPIC_BASE_URL` doesn't end with `/v1`
### Provider disconnects during streaming
- Reduce `PROVIDER_MAX_CONCURRENCY`
- Increase `HTTP_READ_TIMEOUT`
- Check NVIDIA NIM status at [status.nvidia.com](https://status.nvidia.com)
### Model not responding
- Check your NVIDIA API key is valid
- Verify rate limits haven't been hit
- Try a different model
### VS Code extension shows login
- Reload the extension after setting env vars
- Confirm environment variables are set correctly
## Contributing
1. Fork the repo
2. Create a feature branch
3. Run checks: `uv run ruff format && uv run ruff check && uv run ty check`
4. Submit PR
## License
MIT License - See [LICENSE](LICENSE)
## Links
- [GitHub](https://github.com/Yashwant00CR7/claude-code-nvidia)
- [HuggingFace Space](https://huggingface.co/spaces/Yash030/claude-code-proxy)
- [NVIDIA NIM](https://build.nvidia.com)
- [Claude Code](https://github.com/anthropics/claude-code)