Spaces:
Running
Running
| title: Claude Code Proxy | |
| emoji: π€ | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: docker | |
| sdk_version: "3.14" | |
| python_version: "3.14" | |
| app_file: server.py | |
| pinned: false | |
| <div align="center"> | |
| # π€ Free Claude Code | |
| **Use Claude Code with free NVIDIA NIM models through a lightweight proxy.** | |
| [](https://opensource.org/licenses/MIT) | |
| [](https://www.python.org/downloads/) | |
| [](https://github.com/astral-sh/uv) | |
| [](https://github.com/astral-sh/ruff) | |
| </div> | |
| ## The Problem | |
| Claude Code costs $100+/month for API access. This project lets you run it using **free NVIDIA NIM models** instead. | |
| ## The Solution | |
| A FastAPI proxy that translates Claude Code's Anthropic API calls to NVIDIA NIM's OpenAI-compatible endpoint. Zero code changes needed in Claude Code. | |
| ``` | |
| βββββββββββββββββββ Anthropic API ββββββββββββββββββββ | |
| β Claude Code β βββββββββββββββββββββββΆ β Free Claude β | |
| β (Official) β β Code β | |
| β β βββββββββββββββββββββββββ β Proxy β | |
| βββββββββββββββββββ SSE Streaming β (:8082) β | |
| ββββββββββ¬ββββββββββ | |
| β | |
| OpenAI Chat API | |
| β | |
| βΌ | |
| ββββββββββββββββββββ | |
| β NVIDIA NIM β | |
| β (Free Models) β | |
| ββββββββββββββββββββ | |
| ``` | |
| ## Features | |
| - **Drop-in replacement** for Claude Code's Anthropic API | |
| - **7 free NVIDIA NIM models** available via auto-routing | |
| - **Automatic failover** - switches to next model if one hits rate limit | |
| - **Multi-model support** - use different models for different tasks | |
| - **Local optimizations** - fast-path for common probes (saves API calls) | |
| - **Streaming** - real-time response with SSE | |
| - **Tool support** - Claude Code tools work with NIM models | |
| - **Thinking blocks** - reasoning support where models support it | |
| - **Discord/Telegram bots** - remote Claude Code sessions | |
| - **Voice notes** - transcribe voice messages with Whisper | |
| ## Quick Start (Cloud - No Setup) | |
| The easiest way to use this project is on **HuggingFace Spaces** (free tier available). | |
| ### 1. Deploy to HuggingFace Spaces | |
| <a target="_blank" href="https://huggingface.co/new-space?template=Yash030/claude-code-proxy"> | |
| <img src="https://huggingface.co/datasets/huggingface/badges/raw/main/deploy-to-spaces-lg.svg" alt="Deploy to HuggingFace Spaces"/> | |
| </a> | |
| Or manually: | |
| 1. Go to [huggingface.co/spaces/Yash030/claude-code-proxy](https://huggingface.co/spaces/Yash030/claude-code-proxy) | |
| 2. Duplicate the space | |
| 3. Set your secrets in the Space settings: | |
| - `NVIDIA_NIM_API_KEY` - Your NVIDIA API key | |
| - `ANTHROPIC_AUTH_TOKEN` - Your auth token (any secret) | |
| ### 2. Get NVIDIA API Key | |
| Get a free key at [build.nvidia.com/settings/api-keys](https://build.nvidia.com/settings/api-keys). | |
| ### 3. Connect Claude Code | |
| ```bash | |
| # Use your HuggingFace Space URL (ends with .hf.space) | |
| export ANTHROPIC_AUTH_TOKEN="your-secret-token" | |
| export ANTHROPIC_BASE_URL="https://your-space-name.hf.space" | |
| claude | |
| ``` | |
| That's it! Claude Code will use free NVIDIA NIM models. | |
| ## Quick Start (Local) | |
| ### 1. Install Requirements | |
| ```bash | |
| # Install Claude Code | |
| curl -LsSf https://download.anthropic.com/install.sh | sh | |
| # Install uv (fast Python package manager) | |
| curl -LsSf https://astral.sh/uv/install.sh | sh | |
| uv python install 3.14 | |
| ``` | |
| ### 2. Clone and Configure | |
| ```bash | |
| git clone https://github.com/Yashwant00CR7/claude-code-nvidia.git | |
| cd claude-code-nvidia | |
| cp .env.example .env | |
| ``` | |
| Edit `.env`: | |
| ```dotenv | |
| NVIDIA_NIM_API_KEY="nvapi-your-key" | |
| ANTHROPIC_AUTH_TOKEN="freecc" | |
| MODEL="nvidia_nim/z-ai/glm4.7" | |
| ``` | |
| ### 3. Start Proxy | |
| ```bash | |
| uv sync | |
| uv run uvicorn server:app --host 0.0.0.0 --port 8082 | |
| ``` | |
| ### 4. Run Claude Code | |
| ```bash | |
| export ANTHROPIC_AUTH_TOKEN="freecc" | |
| export ANTHROPIC_BASE_URL="http://localhost:8082" | |
| claude | |
| ``` | |
| ## Available Models | |
| The proxy automatically routes to these models in order: | |
| | Model | Best For | Speed | | |
| |-------|----------|-------| | |
| | `qwen3-coder-480b` | Code generation | Fast | | |
| | `glm4.7` | General purpose | Fast | | |
| | `step-3.5-flash` | Fast responses | Very Fast | | |
| | `mistral-large-3` | Reasoning | Medium | | |
| | `dracarys-llama-3.1-70b` | Complex tasks | Medium | | |
| | `seed-oss-36b` | Balanced | Fast | | |
| | `mistral-nemotron` | Thinking tasks | Medium | | |
| ## How Auto-Routing Works | |
| When you use `auto` model, the proxy: | |
| 1. **Tries models in order** of speed/reliability | |
| 2. **Skips rate-limited models** - pre-flight check before each request | |
| 3. **Fast failover** - if one model times out, immediately tries next | |
| 4. **No API waste** - common probes handled locally | |
| ``` | |
| Request: "Write a function" | |
| β | |
| Check if model 1 is rate-limited? β Yes β Skip | |
| Check if model 2 is rate-limited? β No β Try | |
| β | |
| Model 2 responds? β Yes β Stream response | |
| Model 2 timeout? β Try model 3 β Success! | |
| ``` | |
| ## Environment Variables | |
| ### Required | |
| ```dotenv | |
| NVIDIA_NIM_API_KEY="nvapi-your-key" # From build.nvidia.com | |
| ANTHROPIC_AUTH_TOKEN="your-secret" # Any secret you choose | |
| ``` | |
| ### Optional | |
| ```dotenv | |
| MODEL="nvidia_nim/z-ai/glm4.7" # Default model | |
| MODEL_OPUS="nvidia_nim/qwen/qwen3-..." # Model for Opus requests | |
| MODEL_SONNET="nvidia_nim/z-ai/glm4.7" # Model for Sonnet requests | |
| MODEL_HAIKU="nvidia_nim/z-ai/glm4.7" # Model for Haiku requests | |
| # Auto-routing order (comma-separated) | |
| AUTO_MODEL_PRIORITY="nvidia_nim/qwen/...,nvidia_nim/z-ai/..." | |
| # Thinking support | |
| ENABLE_MODEL_THINKING=true # Enable reasoning blocks | |
| ``` | |
| ## IDE Integration | |
| ### VS Code Extension | |
| Add to `.vscode/settings.json`: | |
| ```json | |
| { | |
| "claudeCode.environmentVariables": [ | |
| { "name": "ANTHROPIC_BASE_URL", "value": "http://localhost:8082" }, | |
| { "name": "ANTHROPIC_AUTH_TOKEN", "value": "freecc" } | |
| ] | |
| } | |
| ``` | |
| ### JetBrains ACP | |
| Edit `~/.jetbrains/acp.json`: | |
| ```json | |
| { | |
| "env": { | |
| "ANTHROPIC_BASE_URL": "http://localhost:8082", | |
| "ANTHROPIC_AUTH_TOKEN": "freecc" | |
| } | |
| } | |
| ``` | |
| ### Remote/Ssh | |
| For remote development, deploy to HuggingFace Spaces and use: | |
| ```bash | |
| export ANTHROPIC_BASE_URL="https://your-space.hf.space" | |
| ``` | |
| ## Deployment Options | |
| ### HuggingFace Spaces (Recommended for Cloud) | |
| **Free tier includes:** | |
| - 2 vCPU | |
| - Community support | |
| - Automatic HTTPS | |
| - Git-based deployment | |
| **Setup:** | |
| 1. Fork [the space](https://huggingface.co/spaces/Yash030/claude-code-proxy) | |
| 2. Add `NVIDIA_NIM_API_KEY` to Space secrets | |
| 3. Access at `https://your-space.hf.space` | |
| ### Railway (Easy Deploy) | |
| 1. Connect GitHub repo | |
| 2. Set environment variables | |
| 3. Deploy with auto-scaling | |
| ### Render (Free Tier) | |
| 1. Create Web Service | |
| 2. Connect GitHub | |
| 3. Set build command: `uv sync` | |
| 4. Set start command: `uv run uvicorn server:app --host 0.0.0.0 --port $PORT` | |
| ### Fly.io (Global Edge) | |
| ```bash | |
| fly launch | |
| fly secrets set NVIDIA_NIM_API_KEY="nvapi-..." | |
| fly deploy | |
| ``` | |
| ### Local/Docker | |
| ```bash | |
| docker build -t free-claude-code . | |
| docker run -p 8082:8082 \ | |
| -e NVIDIA_NIM_API_KEY="nvapi-..." \ | |
| -e ANTHROPIC_AUTH_TOKEN="freecc" \ | |
| free-claude-code | |
| ``` | |
| ## Architecture | |
| ``` | |
| api/ | |
| βββ routes.py # FastAPI endpoints | |
| βββ services.py # Request handling & failover | |
| βββ model_router.py # Model resolution | |
| βββ detection.py # Request type detection | |
| βββ optimization_handlers.py # Fast-path responses | |
| core/ | |
| βββ anthropic/ # SSE, token counting, tool parsing | |
| βββ task_detector.py # Task capability detection | |
| providers/ | |
| βββ openai_compat.py # Base OpenAI transport | |
| βββ nvidia_nim/ # NVIDIA NIM provider | |
| βββ rate_limit.py # Rate limiting | |
| messaging/ | |
| βββ discord.py # Discord bot wrapper | |
| βββ telegram.py # Telegram bot wrapper | |
| ``` | |
| ## Troubleshooting | |
| ### "undefined ... input_tokens" error | |
| - Update to latest version: `git pull` | |
| - Check `ANTHROPIC_BASE_URL` doesn't end with `/v1` | |
| ### Provider disconnects during streaming | |
| - Reduce `PROVIDER_MAX_CONCURRENCY` | |
| - Increase `HTTP_READ_TIMEOUT` | |
| - Check NVIDIA NIM status at [status.nvidia.com](https://status.nvidia.com) | |
| ### Model not responding | |
| - Check your NVIDIA API key is valid | |
| - Verify rate limits haven't been hit | |
| - Try a different model | |
| ### VS Code extension shows login | |
| - Reload the extension after setting env vars | |
| - Confirm environment variables are set correctly | |
| ## Contributing | |
| 1. Fork the repo | |
| 2. Create a feature branch | |
| 3. Run checks: `uv run ruff format && uv run ruff check && uv run ty check` | |
| 4. Submit PR | |
| ## License | |
| MIT License - See [LICENSE](LICENSE) | |
| ## Links | |
| - [GitHub](https://github.com/Yashwant00CR7/claude-code-nvidia) | |
| - [HuggingFace Space](https://huggingface.co/spaces/Yash030/claude-code-proxy) | |
| - [NVIDIA NIM](https://build.nvidia.com) | |
| - [Claude Code](https://github.com/anthropics/claude-code) | |