---
title: Claude Code Proxy
emoji: 🤖
colorFrom: blue
colorTo: purple
sdk: docker
sdk_version: "3.14"
python_version: "3.14"
app_file: server.py
pinned: false
---

<div align="center">

# 🤖 Free Claude Code

**Use Claude Code with free NVIDIA NIM models through a lightweight proxy.**

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg?style=for-the-badge)](https://opensource.org/licenses/MIT)
[![Python 3.14](https://img.shields.io/badge/python-3.14-3776ab.svg?style=for-the-badge&logo=python&logoColor=white)](https://www.python.org/downloads/)
[![uv](https://img.shields.io/badge/uv-spawn-ffc21c.svg?style=for-the-badge)](https://github.com/astral-sh/uv)
[![Code style: Ruff](https://img.shields.io/badge/code%20formatting-ruff-f5a623.svg?style=for-the-badge)](https://github.com/astral-sh/ruff)

</div>

## The Problem

Claude Code costs $100+/month for API access. This project lets you run it using **free NVIDIA NIM models** instead.

## The Solution

A FastAPI proxy that translates Claude Code's Anthropic API calls to NVIDIA NIM's OpenAI-compatible endpoint. Zero code changes needed in Claude Code.

```
┌─────────────────┐      Anthropic API       ┌──────────────────┐
│   Claude Code   │ ──────────────────────▶  │  Free Claude    │
│   (Official)    │                          │     Code        │
│                 │ ◀──────────────────────── │    Proxy        │
└─────────────────┘    SSE Streaming         │   (:8082)       │
                                          └────────┬─────────┘
                                                   │
                                          OpenAI Chat API
                                                   │
                                                   ▼
                                          ┌──────────────────┐
                                          │   NVIDIA NIM     │
                                          │  (Free Models)   │
                                          └──────────────────┘
```

## Features

- **Drop-in replacement** for Claude Code's Anthropic API
- **7 free NVIDIA NIM models** available via auto-routing
- **Automatic failover** - switches to next model if one hits rate limit
- **Multi-model support** - use different models for different tasks
- **Local optimizations** - fast-path for common probes (saves API calls)
- **Streaming** - real-time response with SSE
- **Tool support** - Claude Code tools work with NIM models
- **Thinking blocks** - reasoning support where models support it
- **Discord/Telegram bots** - remote Claude Code sessions
- **Voice notes** - transcribe voice messages with Whisper

## Quick Start (Cloud - No Setup)

The easiest way to use this project is on **HuggingFace Spaces** (free tier available).

### 1. Deploy to HuggingFace Spaces

<a target="_blank" href="https://huggingface.co/new-space?template=Yash030/claude-code-proxy">
  <img src="https://huggingface.co/datasets/huggingface/badges/raw/main/deploy-to-spaces-lg.svg" alt="Deploy to HuggingFace Spaces"/>
</a>

Or manually:
1. Go to [huggingface.co/spaces/Yash030/claude-code-proxy](https://huggingface.co/spaces/Yash030/claude-code-proxy)
2. Duplicate the space
3. Set your secrets in the Space settings:
   - `NVIDIA_NIM_API_KEY` - Your NVIDIA API key
   - `ANTHROPIC_AUTH_TOKEN` - Your auth token (any secret)

### 2. Get NVIDIA API Key

Get a free key at [build.nvidia.com/settings/api-keys](https://build.nvidia.com/settings/api-keys).

### 3. Connect Claude Code

```bash
# Use your HuggingFace Space URL (ends with .hf.space)
export ANTHROPIC_AUTH_TOKEN="your-secret-token"
export ANTHROPIC_BASE_URL="https://your-space-name.hf.space"
claude
```

That's it! Claude Code will use free NVIDIA NIM models.

## Quick Start (Local)

### 1. Install Requirements

```bash
# Install Claude Code
curl -LsSf https://download.anthropic.com/install.sh | sh

# Install uv (fast Python package manager)
curl -LsSf https://astral.sh/uv/install.sh | sh
uv python install 3.14
```

### 2. Clone and Configure

```bash
git clone https://github.com/Yashwant00CR7/claude-code-nvidia.git
cd claude-code-nvidia
cp .env.example .env
```

Edit `.env`:
```dotenv
NVIDIA_NIM_API_KEY="nvapi-your-key"
ANTHROPIC_AUTH_TOKEN="freecc"
MODEL="nvidia_nim/z-ai/glm4.7"
```

### 3. Start Proxy

```bash
uv sync
uv run uvicorn server:app --host 0.0.0.0 --port 8082
```

### 4. Run Claude Code

```bash
export ANTHROPIC_AUTH_TOKEN="freecc"
export ANTHROPIC_BASE_URL="http://localhost:8082"
claude
```

## Available Models

The proxy automatically routes to these models in order:

| Model | Best For | Speed |
|-------|----------|-------|
| `qwen3-coder-480b` | Code generation | Fast |
| `glm4.7` | General purpose | Fast |
| `step-3.5-flash` | Fast responses | Very Fast |
| `mistral-large-3` | Reasoning | Medium |
| `dracarys-llama-3.1-70b` | Complex tasks | Medium |
| `seed-oss-36b` | Balanced | Fast |
| `mistral-nemotron` | Thinking tasks | Medium |

## How Auto-Routing Works

When you use `auto` model, the proxy:

1. **Tries models in order** of speed/reliability
2. **Skips rate-limited models** - pre-flight check before each request
3. **Fast failover** - if one model times out, immediately tries next
4. **No API waste** - common probes handled locally

```
Request: "Write a function"
    ↓
Check if model 1 is rate-limited? → Yes → Skip
Check if model 2 is rate-limited? → No → Try
    ↓
Model 2 responds? → Yes → Stream response
Model 2 timeout? → Try model 3 → Success!
```

## Environment Variables

### Required
```dotenv
NVIDIA_NIM_API_KEY="nvapi-your-key"     # From build.nvidia.com
ANTHROPIC_AUTH_TOKEN="your-secret"     # Any secret you choose
```

### Optional
```dotenv
MODEL="nvidia_nim/z-ai/glm4.7"          # Default model
MODEL_OPUS="nvidia_nim/qwen/qwen3-..."  # Model for Opus requests
MODEL_SONNET="nvidia_nim/z-ai/glm4.7"    # Model for Sonnet requests
MODEL_HAIKU="nvidia_nim/z-ai/glm4.7"    # Model for Haiku requests

# Auto-routing order (comma-separated)
AUTO_MODEL_PRIORITY="nvidia_nim/qwen/...,nvidia_nim/z-ai/..."

# Thinking support
ENABLE_MODEL_THINKING=true              # Enable reasoning blocks
```

## IDE Integration

### VS Code Extension

Add to `.vscode/settings.json`:
```json
{
  "claudeCode.environmentVariables": [
    { "name": "ANTHROPIC_BASE_URL", "value": "http://localhost:8082" },
    { "name": "ANTHROPIC_AUTH_TOKEN", "value": "freecc" }
  ]
}
```

### JetBrains ACP

Edit `~/.jetbrains/acp.json`:
```json
{
  "env": {
    "ANTHROPIC_BASE_URL": "http://localhost:8082",
    "ANTHROPIC_AUTH_TOKEN": "freecc"
  }
}
```

### Remote/Ssh

For remote development, deploy to HuggingFace Spaces and use:
```bash
export ANTHROPIC_BASE_URL="https://your-space.hf.space"
```

## Deployment Options

### HuggingFace Spaces (Recommended for Cloud)

**Free tier includes:**
- 2 vCPU
- Community support
- Automatic HTTPS
- Git-based deployment

**Setup:**
1. Fork [the space](https://huggingface.co/spaces/Yash030/claude-code-proxy)
2. Add `NVIDIA_NIM_API_KEY` to Space secrets
3. Access at `https://your-space.hf.space`

### Railway (Easy Deploy)

1. Connect GitHub repo
2. Set environment variables
3. Deploy with auto-scaling

### Render (Free Tier)

1. Create Web Service
2. Connect GitHub
3. Set build command: `uv sync`
4. Set start command: `uv run uvicorn server:app --host 0.0.0.0 --port $PORT`

### Fly.io (Global Edge)

```bash
fly launch
fly secrets set NVIDIA_NIM_API_KEY="nvapi-..."
fly deploy
```

### Local/Docker

```bash
docker build -t free-claude-code .
docker run -p 8082:8082 \
  -e NVIDIA_NIM_API_KEY="nvapi-..." \
  -e ANTHROPIC_AUTH_TOKEN="freecc" \
  free-claude-code
```

## Architecture

```
api/
├── routes.py          # FastAPI endpoints
├── services.py       # Request handling & failover
├── model_router.py   # Model resolution
├── detection.py      # Request type detection
└── optimization_handlers.py  # Fast-path responses

core/
├── anthropic/        # SSE, token counting, tool parsing
└── task_detector.py # Task capability detection

providers/
├── openai_compat.py  # Base OpenAI transport
├── nvidia_nim/       # NVIDIA NIM provider
└── rate_limit.py     # Rate limiting

messaging/
├── discord.py        # Discord bot wrapper
└── telegram.py       # Telegram bot wrapper
```

## Troubleshooting

### "undefined ... input_tokens" error
- Update to latest version: `git pull`
- Check `ANTHROPIC_BASE_URL` doesn't end with `/v1`

### Provider disconnects during streaming
- Reduce `PROVIDER_MAX_CONCURRENCY`
- Increase `HTTP_READ_TIMEOUT`
- Check NVIDIA NIM status at [status.nvidia.com](https://status.nvidia.com)

### Model not responding
- Check your NVIDIA API key is valid
- Verify rate limits haven't been hit
- Try a different model

### VS Code extension shows login
- Reload the extension after setting env vars
- Confirm environment variables are set correctly

## Contributing

1. Fork the repo
2. Create a feature branch
3. Run checks: `uv run ruff format && uv run ruff check && uv run ty check`
4. Submit PR

## License

MIT License - See [LICENSE](LICENSE)

## Links

- [GitHub](https://github.com/Yashwant00CR7/claude-code-nvidia)
- [HuggingFace Space](https://huggingface.co/spaces/Yash030/claude-code-proxy)
- [NVIDIA NIM](https://build.nvidia.com)
- [Claude Code](https://github.com/anthropics/claude-code)