File size: 6,591 Bytes
c00dd2c
 
 
 
 
 
 
 
 
 
3784bc3
ae9d2aa
3784bc3
ae9d2aa
 
3784bc3
 
ae9d2aa
 
 
 
3784bc3
ae9d2aa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3784bc3
 
ae9d2aa
 
 
 
 
 
 
 
 
 
 
3784bc3
 
ae9d2aa
 
 
 
 
 
 
3784bc3
ae9d2aa
3784bc3
 
ae9d2aa
 
 
 
 
 
3784bc3
 
 
 
ae9d2aa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3784bc3
 
ae9d2aa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3784bc3
 
 
 
 
 
 
 
 
 
 
 
 
 
ae9d2aa
 
 
 
 
 
3784bc3
ae9d2aa
 
 
 
 
 
 
 
 
3784bc3
 
 
 
 
 
 
 
ae9d2aa
 
 
 
3784bc3
ae9d2aa
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
---
title: LLM Proxy
emoji: 🔀
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
---

# AI Proxy – LLM Gateway

A transparent relay proxy that forwards API requests from **Claude Code** and **Gemini CLI** through environments where upstream APIs are blocked (e.g. corporate firewalls). The proxy performs auth-swap, header hygiene, and supports SSE streaming.

```
Claude Code  →  AI Proxy  →  api.anthropic.com
Gemini CLI   →  AI Proxy  →  generativelanguage.googleapis.com
```

## Features

- **Multi-provider relay** – Anthropic and Google Gemini via a single proxy
- **1:1 transparent relay** – no request/response body modification
- **SSE streaming** – chunk-by-chunk forwarding, zero buffering
- **Auth swap** – client authenticates with a shared token; server injects the real API key
- **Header hygiene** – strips hop-by-hop headers, authorization, and client-sent API keys
- **Rate limiting** – per-IP, configurable window and max
- **Defensive headers**`X-Content-Type-Options: nosniff`, `X-Frame-Options: DENY`
- **Graceful shutdown** – finishes in-flight streams before exiting
- **Hugging Face Spaces ready** – Docker configuration pre-set for HF Spaces deployment

## Quick Start (Local)

```bash
# 1. Clone & install
git clone <your-repo-url> && cd ai-proxy
npm install

# 2. Configure – copy and edit
cp .env.example .env
# Set PROXY_AUTH_TOKEN and at least one provider key
# (ANTHROPIC_API_KEY and/or GEMINI_API_KEY)

# 3. Run
npm run dev
```

Health check: `curl http://localhost:7860/health`

## Environment Variables

| Variable | Required | Default | Description |
|---|---|---|---|
| `PROXY_AUTH_TOKEN` | Yes | – | Shared secret for client authentication |
| `ANTHROPIC_API_KEY` | – | – | Anthropic API key (enables Anthropic relay when set) |
| `PORT` | – | `7860` | Server port |
| `HOST` | – | `0.0.0.0` | Server bind address |
| `LOG_LEVEL` | – | `info` | `trace` \| `debug` \| `info` \| `warn` \| `error` |
| `RATE_LIMIT_MAX` | – | `100` | Requests per time window per IP |
| `RATE_LIMIT_WINDOW_MS` | – | `60000` | Rate limit window (ms) |
| `BODY_LIMIT` | – | `5242880` | Max request body size (bytes, 5 MB) |
| `CORS_ORIGIN` | – | *(disabled)* | CORS origin (e.g. `*` or `https://example.com`) |
| `ANTHROPIC_BASE_URL` | – | `https://api.anthropic.com` | Upstream Anthropic URL |
| `UPSTREAM_TIMEOUT_MS` | – | `300000` | Upstream request timeout (5 min) |
| `GEMINI_API_KEY` | – | – | Gemini API key (enables Gemini relay when set) |
| `GEMINI_BASE_URL` | – | `https://generativelanguage.googleapis.com` | Upstream Gemini URL |

## API Endpoints

| Method | Path | Auth | Description |
|---|---|---|---|
| `GET` | `/health` | No | Health check → `{"status":"ok"}` |
| `POST` | `/v1/messages` | Yes | Anthropic chat completions (relayed 1:1) |
| `POST` | `/v1/messages/count_tokens` | Yes | Anthropic token counting (relayed 1:1) |
| `POST` | `/v1beta/models/{model}:generateContent` | Yes | Gemini content generation (relayed 1:1) |
| `POST` | `/v1beta/models/{model}:streamGenerateContent` | Yes | Gemini streaming generation (relayed 1:1) |

All other routes return `404`. Non-POST methods on API routes return `405`.

## Docker

### Local (docker compose)

```bash
cp .env.example .env
# Edit .env with your keys
docker compose up --build
```

### Hugging Face Spaces

1. Create a new Space on [huggingface.co/new-space](https://huggingface.co/new-space):
   - **SDK**: Docker
   - **Visibility**: Private (recommended – this handles API keys)

2. Push this repository to the Space:
   ```bash
   git remote add hf https://huggingface.co/spaces/<YOUR_USER>/<SPACE_NAME>
   git push hf main
   ```

3. Configure **Secrets** in Space Settings → Repository secrets:
   - `PROXY_AUTH_TOKEN` = your chosen shared secret
   - `ANTHROPIC_API_KEY` = your Anthropic key *(at least one provider required)*
   - `GEMINI_API_KEY` = your Gemini API key *(at least one provider required)*

4. The Space will build and deploy automatically. Your proxy URL will be:
   ```
   https://<YOUR_USER>-<SPACE_NAME>.hf.space
   ```

> **Note:** HF Spaces secrets become environment variables at runtime. The Dockerfile already defaults to port 7860 and runs as uid 1000 as required by the platform.

## Claude Code Client Configuration

### Option 1: Environment Variables

```bash
export ANTHROPIC_BASE_URL=https://your-server.example.com
export ANTHROPIC_AUTH_TOKEN=your-proxy-auth-token
claude
```

### Option 2: Persistent (settings.json)

```json
// ~/.claude/settings.json
{
  "env": {
    "ANTHROPIC_BASE_URL": "https://your-server.example.com",
    "ANTHROPIC_AUTH_TOKEN": "your-proxy-auth-token"
  }
}
```

### Option 3: Managed Settings (Enterprise)

```json
// macOS: /Library/Application Support/ClaudeCode/managed-settings.json
// Linux: /etc/claude-code/managed-settings.json
{
  "env": {
    "ANTHROPIC_BASE_URL": "https://your-server.example.com"
  }
}
```

## Gemini CLI Client Configuration

Configure Gemini CLI to use the proxy by setting the base URL and API key:

```bash
export GOOGLE_GEMINI_BASE_URL=https://your-server.example.com
export GEMINI_API_KEY=your-proxy-auth-token
gemini
```

> **Note:** Use the same `PROXY_AUTH_TOKEN` value as `GEMINI_API_KEY` on the client side. The proxy accepts it via the `x-goog-api-key` header, validates it as the proxy auth token, and replaces it with the real Gemini API key before forwarding upstream.

**Important:** Authenticate Gemini CLI via API key (not Google login). If you have a cached Google session, run `gemini --clear-credentials` first, otherwise the CLI may ignore the base URL override.

### Test the Connection

```bash
# Health check
curl https://your-server.example.com/health

# Test Anthropic relay
curl -X POST https://your-server.example.com/v1/messages \
  -H "Authorization: Bearer your-proxy-auth-token" \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-20250514",
    "max_tokens": 100,
    "messages": [{"role": "user", "content": "Hi"}]
  }'

# Test Gemini relay
curl -X POST https://your-server.example.com/v1beta/models/gemini-2.0-flash:generateContent \
  -H "Authorization: Bearer your-proxy-auth-token" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{"parts": [{"text": "Hi"}]}]
  }'
```

## Tech Stack

- **Runtime:** Node.js >= 20
- **Framework:** [Fastify](https://fastify.dev/) 5
- **HTTP Client:** [undici](https://undici.nodejs.org/)
- **Language:** TypeScript (strict mode)

## License

MIT