Spaces:

Yash030
/

claude-code-proxy

Running

App Files Files Community

claude-code-proxy / README.md

Yash030

docs: complete README refactor with cloud deploy guide

fcc5278 4 days ago

preview code

raw

history blame contribute delete

9.78 kB

	---
	title: Claude Code Proxy
	emoji: 🤖
	colorFrom: blue
	colorTo: purple
	sdk: docker
	sdk_version: "3.14"
	python_version: "3.14"
	app_file: server.py
	pinned: false
	---

	<div align="center">

	# 🤖 Free Claude Code

	Use Claude Code with free NVIDIA NIM models through a lightweight proxy.

	[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg?style=for-the-badge)](https://opensource.org/licenses/MIT)
	[![Python 3.14](https://img.shields.io/badge/python-3.14-3776ab.svg?style=for-the-badge&logo=python&logoColor=white)](https://www.python.org/downloads/)
	[![uv](https://img.shields.io/badge/uv-spawn-ffc21c.svg?style=for-the-badge)](https://github.com/astral-sh/uv)
	[![Code style: Ruff](https://img.shields.io/badge/code%20formatting-ruff-f5a623.svg?style=for-the-badge)](https://github.com/astral-sh/ruff)

	</div>

	## The Problem

	Claude Code costs $100+/month for API access. This project lets you run it using free NVIDIA NIM models instead.

	## The Solution

	A FastAPI proxy that translates Claude Code's Anthropic API calls to NVIDIA NIM's OpenAI-compatible endpoint. Zero code changes needed in Claude Code.

	```
	┌─────────────────┐ Anthropic API ┌──────────────────┐
	│ Claude Code │ ──────────────────────▶ │ Free Claude │
	│ (Official) │ │ Code │
	│ │ ◀──────────────────────── │ Proxy │
	└─────────────────┘ SSE Streaming │ (:8082) │
	└────────┬─────────┘
	│
	OpenAI Chat API
	│
	▼
	┌──────────────────┐
	│ NVIDIA NIM │
	│ (Free Models) │
	└──────────────────┘
	```

	## Features

	- Drop-in replacement for Claude Code's Anthropic API
	- 7 free NVIDIA NIM models available via auto-routing
	- Automatic failover - switches to next model if one hits rate limit
	- Multi-model support - use different models for different tasks
	- Local optimizations - fast-path for common probes (saves API calls)
	- Streaming - real-time response with SSE
	- Tool support - Claude Code tools work with NIM models
	- Thinking blocks - reasoning support where models support it
	- Discord/Telegram bots - remote Claude Code sessions
	- Voice notes - transcribe voice messages with Whisper

	## Quick Start (Cloud - No Setup)

	The easiest way to use this project is on HuggingFace Spaces (free tier available).

	### 1. Deploy to HuggingFace Spaces

	<a target="_blank" href="https://huggingface.co/new-space?template=Yash030/claude-code-proxy">
	<img src="https://huggingface.co/datasets/huggingface/badges/raw/main/deploy-to-spaces-lg.svg" alt="Deploy to HuggingFace Spaces"/>
	</a>

	Or manually:
	1. Go to [huggingface.co/spaces/Yash030/claude-code-proxy](https://huggingface.co/spaces/Yash030/claude-code-proxy)
	2. Duplicate the space
	3. Set your secrets in the Space settings:
	- `NVIDIA_NIM_API_KEY` - Your NVIDIA API key
	- `ANTHROPIC_AUTH_TOKEN` - Your auth token (any secret)

	### 2. Get NVIDIA API Key

	Get a free key at [build.nvidia.com/settings/api-keys](https://build.nvidia.com/settings/api-keys).

	### 3. Connect Claude Code

	```bash
	# Use your HuggingFace Space URL (ends with .hf.space)
	export ANTHROPIC_AUTH_TOKEN="your-secret-token"
	export ANTHROPIC_BASE_URL="https://your-space-name.hf.space"
	claude
	```

	That's it! Claude Code will use free NVIDIA NIM models.

	## Quick Start (Local)

	### 1. Install Requirements

	```bash
	# Install Claude Code
	curl -LsSf https://download.anthropic.com/install.sh \| sh

	# Install uv (fast Python package manager)
	curl -LsSf https://astral.sh/uv/install.sh \| sh
	uv python install 3.14
	```

	### 2. Clone and Configure

	```bash
	git clone https://github.com/Yashwant00CR7/claude-code-nvidia.git
	cd claude-code-nvidia
	cp .env.example .env
	```

	Edit `.env`:
	```dotenv
	NVIDIA_NIM_API_KEY="nvapi-your-key"
	ANTHROPIC_AUTH_TOKEN="freecc"
	MODEL="nvidia_nim/z-ai/glm4.7"
	```

	### 3. Start Proxy

	```bash
	uv sync
	uv run uvicorn server:app --host 0.0.0.0 --port 8082
	```

	### 4. Run Claude Code

	```bash
	export ANTHROPIC_AUTH_TOKEN="freecc"
	export ANTHROPIC_BASE_URL="http://localhost:8082"
	claude
	```

	## Available Models

	The proxy automatically routes to these models in order:

	\| Model \| Best For \| Speed \|
	\|-------\|----------\|-------\|
	\| `qwen3-coder-480b` \| Code generation \| Fast \|
	\| `glm4.7` \| General purpose \| Fast \|
	\| `step-3.5-flash` \| Fast responses \| Very Fast \|
	\| `mistral-large-3` \| Reasoning \| Medium \|
	\| `dracarys-llama-3.1-70b` \| Complex tasks \| Medium \|
	\| `seed-oss-36b` \| Balanced \| Fast \|
	\| `mistral-nemotron` \| Thinking tasks \| Medium \|

	## How Auto-Routing Works

	When you use `auto` model, the proxy:

	1. Tries models in order of speed/reliability
	2. Skips rate-limited models - pre-flight check before each request
	3. Fast failover - if one model times out, immediately tries next
	4. No API waste - common probes handled locally

	```
	Request: "Write a function"
	↓
	Check if model 1 is rate-limited? → Yes → Skip
	Check if model 2 is rate-limited? → No → Try
	↓
	Model 2 responds? → Yes → Stream response
	Model 2 timeout? → Try model 3 → Success!
	```

	## Environment Variables

	### Required
	```dotenv
	NVIDIA_NIM_API_KEY="nvapi-your-key" # From build.nvidia.com
	ANTHROPIC_AUTH_TOKEN="your-secret" # Any secret you choose
	```

	### Optional
	```dotenv
	MODEL="nvidia_nim/z-ai/glm4.7" # Default model
	MODEL_OPUS="nvidia_nim/qwen/qwen3-..." # Model for Opus requests
	MODEL_SONNET="nvidia_nim/z-ai/glm4.7" # Model for Sonnet requests
	MODEL_HAIKU="nvidia_nim/z-ai/glm4.7" # Model for Haiku requests

	# Auto-routing order (comma-separated)
	AUTO_MODEL_PRIORITY="nvidia_nim/qwen/...,nvidia_nim/z-ai/..."

	# Thinking support
	ENABLE_MODEL_THINKING=true # Enable reasoning blocks
	```

	## IDE Integration

	### VS Code Extension

	Add to `.vscode/settings.json`:
	```json
	{
	"claudeCode.environmentVariables": [
	{ "name": "ANTHROPIC_BASE_URL", "value": "http://localhost:8082" },
	{ "name": "ANTHROPIC_AUTH_TOKEN", "value": "freecc" }
	]
	}
	```

	### JetBrains ACP

	Edit `~/.jetbrains/acp.json`:
	```json
	{
	"env": {
	"ANTHROPIC_BASE_URL": "http://localhost:8082",
	"ANTHROPIC_AUTH_TOKEN": "freecc"
	}
	}
	```

	### Remote/Ssh

	For remote development, deploy to HuggingFace Spaces and use:
	```bash
	export ANTHROPIC_BASE_URL="https://your-space.hf.space"
	```

	## Deployment Options

	### HuggingFace Spaces (Recommended for Cloud)

	Free tier includes:
	- 2 vCPU
	- Community support
	- Automatic HTTPS
	- Git-based deployment

	Setup:
	1. Fork [the space](https://huggingface.co/spaces/Yash030/claude-code-proxy)
	2. Add `NVIDIA_NIM_API_KEY` to Space secrets
	3. Access at `https://your-space.hf.space`

	### Railway (Easy Deploy)

	1. Connect GitHub repo
	2. Set environment variables
	3. Deploy with auto-scaling

	### Render (Free Tier)

	1. Create Web Service
	2. Connect GitHub
	3. Set build command: `uv sync`
	4. Set start command: `uv run uvicorn server:app --host 0.0.0.0 --port $PORT`

	### Fly.io (Global Edge)

	```bash
	fly launch
	fly secrets set NVIDIA_NIM_API_KEY="nvapi-..."
	fly deploy
	```

	### Local/Docker

	```bash
	docker build -t free-claude-code .
	docker run -p 8082:8082 \
	-e NVIDIA_NIM_API_KEY="nvapi-..." \
	-e ANTHROPIC_AUTH_TOKEN="freecc" \
	free-claude-code
	```

	## Architecture

	```
	api/
	├── routes.py # FastAPI endpoints
	├── services.py # Request handling & failover
	├── model_router.py # Model resolution
	├── detection.py # Request type detection
	└── optimization_handlers.py # Fast-path responses

	core/
	├── anthropic/ # SSE, token counting, tool parsing
	└── task_detector.py # Task capability detection

	providers/
	├── openai_compat.py # Base OpenAI transport
	├── nvidia_nim/ # NVIDIA NIM provider
	└── rate_limit.py # Rate limiting

	messaging/
	├── discord.py # Discord bot wrapper
	└── telegram.py # Telegram bot wrapper
	```

	## Troubleshooting

	### "undefined ... input_tokens" error
	- Update to latest version: `git pull`
	- Check `ANTHROPIC_BASE_URL` doesn't end with `/v1`

	### Provider disconnects during streaming
	- Reduce `PROVIDER_MAX_CONCURRENCY`
	- Increase `HTTP_READ_TIMEOUT`
	- Check NVIDIA NIM status at [status.nvidia.com](https://status.nvidia.com)

	### Model not responding
	- Check your NVIDIA API key is valid
	- Verify rate limits haven't been hit
	- Try a different model

	### VS Code extension shows login
	- Reload the extension after setting env vars
	- Confirm environment variables are set correctly

	## Contributing

	1. Fork the repo
	2. Create a feature branch
	3. Run checks: `uv run ruff format && uv run ruff check && uv run ty check`
	4. Submit PR

	## License

	MIT License - See [LICENSE](LICENSE)

	## Links

	- [GitHub](https://github.com/Yashwant00CR7/claude-code-nvidia)
	- [HuggingFace Space](https://huggingface.co/spaces/Yash030/claude-code-proxy)
	- [NVIDIA NIM](https://build.nvidia.com)
	- [Claude Code](https://github.com/anthropics/claude-code)