File size: 6,591 Bytes
c00dd2c 3784bc3 ae9d2aa 3784bc3 ae9d2aa 3784bc3 ae9d2aa 3784bc3 ae9d2aa 3784bc3 ae9d2aa 3784bc3 ae9d2aa 3784bc3 ae9d2aa 3784bc3 ae9d2aa 3784bc3 ae9d2aa 3784bc3 ae9d2aa 3784bc3 ae9d2aa 3784bc3 ae9d2aa 3784bc3 ae9d2aa 3784bc3 ae9d2aa | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 | ---
title: LLM Proxy
emoji: 🔀
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
---
# AI Proxy – LLM Gateway
A transparent relay proxy that forwards API requests from **Claude Code** and **Gemini CLI** through environments where upstream APIs are blocked (e.g. corporate firewalls). The proxy performs auth-swap, header hygiene, and supports SSE streaming.
```
Claude Code → AI Proxy → api.anthropic.com
Gemini CLI → AI Proxy → generativelanguage.googleapis.com
```
## Features
- **Multi-provider relay** – Anthropic and Google Gemini via a single proxy
- **1:1 transparent relay** – no request/response body modification
- **SSE streaming** – chunk-by-chunk forwarding, zero buffering
- **Auth swap** – client authenticates with a shared token; server injects the real API key
- **Header hygiene** – strips hop-by-hop headers, authorization, and client-sent API keys
- **Rate limiting** – per-IP, configurable window and max
- **Defensive headers** – `X-Content-Type-Options: nosniff`, `X-Frame-Options: DENY`
- **Graceful shutdown** – finishes in-flight streams before exiting
- **Hugging Face Spaces ready** – Docker configuration pre-set for HF Spaces deployment
## Quick Start (Local)
```bash
# 1. Clone & install
git clone <your-repo-url> && cd ai-proxy
npm install
# 2. Configure – copy and edit
cp .env.example .env
# Set PROXY_AUTH_TOKEN and at least one provider key
# (ANTHROPIC_API_KEY and/or GEMINI_API_KEY)
# 3. Run
npm run dev
```
Health check: `curl http://localhost:7860/health`
## Environment Variables
| Variable | Required | Default | Description |
|---|---|---|---|
| `PROXY_AUTH_TOKEN` | Yes | – | Shared secret for client authentication |
| `ANTHROPIC_API_KEY` | – | – | Anthropic API key (enables Anthropic relay when set) |
| `PORT` | – | `7860` | Server port |
| `HOST` | – | `0.0.0.0` | Server bind address |
| `LOG_LEVEL` | – | `info` | `trace` \| `debug` \| `info` \| `warn` \| `error` |
| `RATE_LIMIT_MAX` | – | `100` | Requests per time window per IP |
| `RATE_LIMIT_WINDOW_MS` | – | `60000` | Rate limit window (ms) |
| `BODY_LIMIT` | – | `5242880` | Max request body size (bytes, 5 MB) |
| `CORS_ORIGIN` | – | *(disabled)* | CORS origin (e.g. `*` or `https://example.com`) |
| `ANTHROPIC_BASE_URL` | – | `https://api.anthropic.com` | Upstream Anthropic URL |
| `UPSTREAM_TIMEOUT_MS` | – | `300000` | Upstream request timeout (5 min) |
| `GEMINI_API_KEY` | – | – | Gemini API key (enables Gemini relay when set) |
| `GEMINI_BASE_URL` | – | `https://generativelanguage.googleapis.com` | Upstream Gemini URL |
## API Endpoints
| Method | Path | Auth | Description |
|---|---|---|---|
| `GET` | `/health` | No | Health check → `{"status":"ok"}` |
| `POST` | `/v1/messages` | Yes | Anthropic chat completions (relayed 1:1) |
| `POST` | `/v1/messages/count_tokens` | Yes | Anthropic token counting (relayed 1:1) |
| `POST` | `/v1beta/models/{model}:generateContent` | Yes | Gemini content generation (relayed 1:1) |
| `POST` | `/v1beta/models/{model}:streamGenerateContent` | Yes | Gemini streaming generation (relayed 1:1) |
All other routes return `404`. Non-POST methods on API routes return `405`.
## Docker
### Local (docker compose)
```bash
cp .env.example .env
# Edit .env with your keys
docker compose up --build
```
### Hugging Face Spaces
1. Create a new Space on [huggingface.co/new-space](https://huggingface.co/new-space):
- **SDK**: Docker
- **Visibility**: Private (recommended – this handles API keys)
2. Push this repository to the Space:
```bash
git remote add hf https://huggingface.co/spaces/<YOUR_USER>/<SPACE_NAME>
git push hf main
```
3. Configure **Secrets** in Space Settings → Repository secrets:
- `PROXY_AUTH_TOKEN` = your chosen shared secret
- `ANTHROPIC_API_KEY` = your Anthropic key *(at least one provider required)*
- `GEMINI_API_KEY` = your Gemini API key *(at least one provider required)*
4. The Space will build and deploy automatically. Your proxy URL will be:
```
https://<YOUR_USER>-<SPACE_NAME>.hf.space
```
> **Note:** HF Spaces secrets become environment variables at runtime. The Dockerfile already defaults to port 7860 and runs as uid 1000 as required by the platform.
## Claude Code Client Configuration
### Option 1: Environment Variables
```bash
export ANTHROPIC_BASE_URL=https://your-server.example.com
export ANTHROPIC_AUTH_TOKEN=your-proxy-auth-token
claude
```
### Option 2: Persistent (settings.json)
```json
// ~/.claude/settings.json
{
"env": {
"ANTHROPIC_BASE_URL": "https://your-server.example.com",
"ANTHROPIC_AUTH_TOKEN": "your-proxy-auth-token"
}
}
```
### Option 3: Managed Settings (Enterprise)
```json
// macOS: /Library/Application Support/ClaudeCode/managed-settings.json
// Linux: /etc/claude-code/managed-settings.json
{
"env": {
"ANTHROPIC_BASE_URL": "https://your-server.example.com"
}
}
```
## Gemini CLI Client Configuration
Configure Gemini CLI to use the proxy by setting the base URL and API key:
```bash
export GOOGLE_GEMINI_BASE_URL=https://your-server.example.com
export GEMINI_API_KEY=your-proxy-auth-token
gemini
```
> **Note:** Use the same `PROXY_AUTH_TOKEN` value as `GEMINI_API_KEY` on the client side. The proxy accepts it via the `x-goog-api-key` header, validates it as the proxy auth token, and replaces it with the real Gemini API key before forwarding upstream.
**Important:** Authenticate Gemini CLI via API key (not Google login). If you have a cached Google session, run `gemini --clear-credentials` first, otherwise the CLI may ignore the base URL override.
### Test the Connection
```bash
# Health check
curl https://your-server.example.com/health
# Test Anthropic relay
curl -X POST https://your-server.example.com/v1/messages \
-H "Authorization: Bearer your-proxy-auth-token" \
-H "Content-Type: application/json" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-sonnet-4-20250514",
"max_tokens": 100,
"messages": [{"role": "user", "content": "Hi"}]
}'
# Test Gemini relay
curl -X POST https://your-server.example.com/v1beta/models/gemini-2.0-flash:generateContent \
-H "Authorization: Bearer your-proxy-auth-token" \
-H "Content-Type: application/json" \
-d '{
"contents": [{"parts": [{"text": "Hi"}]}]
}'
```
## Tech Stack
- **Runtime:** Node.js >= 20
- **Framework:** [Fastify](https://fastify.dev/) 5
- **HTTP Client:** [undici](https://undici.nodejs.org/)
- **Language:** TypeScript (strict mode)
## License
MIT
|