llm-proxy / README.md
relfa's picture
feat: Add Gemini API support and refactor proxy logic for multi-provider extensibility.
3784bc3
---
title: LLM Proxy
emoji: 🔀
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
---
# AI Proxy – LLM Gateway
A transparent relay proxy that forwards API requests from **Claude Code** and **Gemini CLI** through environments where upstream APIs are blocked (e.g. corporate firewalls). The proxy performs auth-swap, header hygiene, and supports SSE streaming.
```
Claude Code → AI Proxy → api.anthropic.com
Gemini CLI → AI Proxy → generativelanguage.googleapis.com
```
## Features
- **Multi-provider relay** – Anthropic and Google Gemini via a single proxy
- **1:1 transparent relay** – no request/response body modification
- **SSE streaming** – chunk-by-chunk forwarding, zero buffering
- **Auth swap** – client authenticates with a shared token; server injects the real API key
- **Header hygiene** – strips hop-by-hop headers, authorization, and client-sent API keys
- **Rate limiting** – per-IP, configurable window and max
- **Defensive headers**`X-Content-Type-Options: nosniff`, `X-Frame-Options: DENY`
- **Graceful shutdown** – finishes in-flight streams before exiting
- **Hugging Face Spaces ready** – Docker configuration pre-set for HF Spaces deployment
## Quick Start (Local)
```bash
# 1. Clone & install
git clone <your-repo-url> && cd ai-proxy
npm install
# 2. Configure – copy and edit
cp .env.example .env
# Set PROXY_AUTH_TOKEN and at least one provider key
# (ANTHROPIC_API_KEY and/or GEMINI_API_KEY)
# 3. Run
npm run dev
```
Health check: `curl http://localhost:7860/health`
## Environment Variables
| Variable | Required | Default | Description |
|---|---|---|---|
| `PROXY_AUTH_TOKEN` | Yes | – | Shared secret for client authentication |
| `ANTHROPIC_API_KEY` | – | – | Anthropic API key (enables Anthropic relay when set) |
| `PORT` | – | `7860` | Server port |
| `HOST` | – | `0.0.0.0` | Server bind address |
| `LOG_LEVEL` | – | `info` | `trace` \| `debug` \| `info` \| `warn` \| `error` |
| `RATE_LIMIT_MAX` | – | `100` | Requests per time window per IP |
| `RATE_LIMIT_WINDOW_MS` | – | `60000` | Rate limit window (ms) |
| `BODY_LIMIT` | – | `5242880` | Max request body size (bytes, 5 MB) |
| `CORS_ORIGIN` | – | *(disabled)* | CORS origin (e.g. `*` or `https://example.com`) |
| `ANTHROPIC_BASE_URL` | – | `https://api.anthropic.com` | Upstream Anthropic URL |
| `UPSTREAM_TIMEOUT_MS` | – | `300000` | Upstream request timeout (5 min) |
| `GEMINI_API_KEY` | – | – | Gemini API key (enables Gemini relay when set) |
| `GEMINI_BASE_URL` | – | `https://generativelanguage.googleapis.com` | Upstream Gemini URL |
## API Endpoints
| Method | Path | Auth | Description |
|---|---|---|---|
| `GET` | `/health` | No | Health check → `{"status":"ok"}` |
| `POST` | `/v1/messages` | Yes | Anthropic chat completions (relayed 1:1) |
| `POST` | `/v1/messages/count_tokens` | Yes | Anthropic token counting (relayed 1:1) |
| `POST` | `/v1beta/models/{model}:generateContent` | Yes | Gemini content generation (relayed 1:1) |
| `POST` | `/v1beta/models/{model}:streamGenerateContent` | Yes | Gemini streaming generation (relayed 1:1) |
All other routes return `404`. Non-POST methods on API routes return `405`.
## Docker
### Local (docker compose)
```bash
cp .env.example .env
# Edit .env with your keys
docker compose up --build
```
### Hugging Face Spaces
1. Create a new Space on [huggingface.co/new-space](https://huggingface.co/new-space):
- **SDK**: Docker
- **Visibility**: Private (recommended – this handles API keys)
2. Push this repository to the Space:
```bash
git remote add hf https://huggingface.co/spaces/<YOUR_USER>/<SPACE_NAME>
git push hf main
```
3. Configure **Secrets** in Space Settings → Repository secrets:
- `PROXY_AUTH_TOKEN` = your chosen shared secret
- `ANTHROPIC_API_KEY` = your Anthropic key *(at least one provider required)*
- `GEMINI_API_KEY` = your Gemini API key *(at least one provider required)*
4. The Space will build and deploy automatically. Your proxy URL will be:
```
https://<YOUR_USER>-<SPACE_NAME>.hf.space
```
> **Note:** HF Spaces secrets become environment variables at runtime. The Dockerfile already defaults to port 7860 and runs as uid 1000 as required by the platform.
## Claude Code Client Configuration
### Option 1: Environment Variables
```bash
export ANTHROPIC_BASE_URL=https://your-server.example.com
export ANTHROPIC_AUTH_TOKEN=your-proxy-auth-token
claude
```
### Option 2: Persistent (settings.json)
```json
// ~/.claude/settings.json
{
"env": {
"ANTHROPIC_BASE_URL": "https://your-server.example.com",
"ANTHROPIC_AUTH_TOKEN": "your-proxy-auth-token"
}
}
```
### Option 3: Managed Settings (Enterprise)
```json
// macOS: /Library/Application Support/ClaudeCode/managed-settings.json
// Linux: /etc/claude-code/managed-settings.json
{
"env": {
"ANTHROPIC_BASE_URL": "https://your-server.example.com"
}
}
```
## Gemini CLI Client Configuration
Configure Gemini CLI to use the proxy by setting the base URL and API key:
```bash
export GOOGLE_GEMINI_BASE_URL=https://your-server.example.com
export GEMINI_API_KEY=your-proxy-auth-token
gemini
```
> **Note:** Use the same `PROXY_AUTH_TOKEN` value as `GEMINI_API_KEY` on the client side. The proxy accepts it via the `x-goog-api-key` header, validates it as the proxy auth token, and replaces it with the real Gemini API key before forwarding upstream.
**Important:** Authenticate Gemini CLI via API key (not Google login). If you have a cached Google session, run `gemini --clear-credentials` first, otherwise the CLI may ignore the base URL override.
### Test the Connection
```bash
# Health check
curl https://your-server.example.com/health
# Test Anthropic relay
curl -X POST https://your-server.example.com/v1/messages \
-H "Authorization: Bearer your-proxy-auth-token" \
-H "Content-Type: application/json" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-sonnet-4-20250514",
"max_tokens": 100,
"messages": [{"role": "user", "content": "Hi"}]
}'
# Test Gemini relay
curl -X POST https://your-server.example.com/v1beta/models/gemini-2.0-flash:generateContent \
-H "Authorization: Bearer your-proxy-auth-token" \
-H "Content-Type: application/json" \
-d '{
"contents": [{"parts": [{"text": "Hi"}]}]
}'
```
## Tech Stack
- **Runtime:** Node.js >= 20
- **Framework:** [Fastify](https://fastify.dev/) 5
- **HTTP Client:** [undici](https://undici.nodejs.org/)
- **Language:** TypeScript (strict mode)
## License
MIT