Spaces:

relfa
/

llm-proxy

Runtime error

File size: 6,591 Bytes

---
title: LLM Proxy
emoji: 🔀
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
---

# AI Proxy – LLM Gateway

A transparent relay proxy that forwards API requests from **Claude Code** and **Gemini CLI** through environments where upstream APIs are blocked (e.g. corporate firewalls). The proxy performs auth-swap, header hygiene, and supports SSE streaming.

```
Claude Code  →  AI Proxy  →  api.anthropic.com
Gemini CLI   →  AI Proxy  →  generativelanguage.googleapis.com
```

## Features

- **Multi-provider relay** – Anthropic and Google Gemini via a single proxy
- **1:1 transparent relay** – no request/response body modification
- **SSE streaming** – chunk-by-chunk forwarding, zero buffering
- **Auth swap** – client authenticates with a shared token; server injects the real API key
- **Header hygiene** – strips hop-by-hop headers, authorization, and client-sent API keys
- **Rate limiting** – per-IP, configurable window and max
- **Defensive headers** – `X-Content-Type-Options: nosniff`, `X-Frame-Options: DENY`
- **Graceful shutdown** – finishes in-flight streams before exiting
- **Hugging Face Spaces ready** – Docker configuration pre-set for HF Spaces deployment

## Quick Start (Local)

```bash
# 1. Clone & install
git clone <your-repo-url> && cd ai-proxy
npm install

# 2. Configure – copy and edit
cp .env.example .env
# Set PROXY_AUTH_TOKEN and at least one provider key
# (ANTHROPIC_API_KEY and/or GEMINI_API_KEY)

# 3. Run
npm run dev
```

Health check: `curl http://localhost:7860/health`

## Environment Variables

| Variable | Required | Default | Description |
|---|---|---|---|
| `PROXY_AUTH_TOKEN` | Yes | – | Shared secret for client authentication |
| `ANTHROPIC_API_KEY` | – | – | Anthropic API key (enables Anthropic relay when set) |
| `PORT` | – | `7860` | Server port |
| `HOST` | – | `0.0.0.0` | Server bind address |
| `LOG_LEVEL` | – | `info` | `trace` \| `debug` \| `info` \| `warn` \| `error` |
| `RATE_LIMIT_MAX` | – | `100` | Requests per time window per IP |
| `RATE_LIMIT_WINDOW_MS` | – | `60000` | Rate limit window (ms) |
| `BODY_LIMIT` | – | `5242880` | Max request body size (bytes, 5 MB) |
| `CORS_ORIGIN` | – | *(disabled)* | CORS origin (e.g. `*` or `https://example.com`) |
| `ANTHROPIC_BASE_URL` | – | `https://api.anthropic.com` | Upstream Anthropic URL |
| `UPSTREAM_TIMEOUT_MS` | – | `300000` | Upstream request timeout (5 min) |
| `GEMINI_API_KEY` | – | – | Gemini API key (enables Gemini relay when set) |
| `GEMINI_BASE_URL` | – | `https://generativelanguage.googleapis.com` | Upstream Gemini URL |

## API Endpoints

| Method | Path | Auth | Description |
|---|---|---|---|
| `GET` | `/health` | No | Health check → `{"status":"ok"}` |
| `POST` | `/v1/messages` | Yes | Anthropic chat completions (relayed 1:1) |
| `POST` | `/v1/messages/count_tokens` | Yes | Anthropic token counting (relayed 1:1) |
| `POST` | `/v1beta/models/{model}:generateContent` | Yes | Gemini content generation (relayed 1:1) |
| `POST` | `/v1beta/models/{model}:streamGenerateContent` | Yes | Gemini streaming generation (relayed 1:1) |

All other routes return `404`. Non-POST methods on API routes return `405`.

## Docker

### Local (docker compose)

```bash
cp .env.example .env
# Edit .env with your keys
docker compose up --build
```

### Hugging Face Spaces

1. Create a new Space on [huggingface.co/new-space](https://huggingface.co/new-space):
   - **SDK**: Docker
   - **Visibility**: Private (recommended – this handles API keys)

2. Push this repository to the Space:
   ```bash
   git remote add hf https://huggingface.co/spaces/<YOUR_USER>/<SPACE_NAME>
   git push hf main
   ```

3. Configure **Secrets** in Space Settings → Repository secrets:
   - `PROXY_AUTH_TOKEN` = your chosen shared secret
   - `ANTHROPIC_API_KEY` = your Anthropic key *(at least one provider required)*
   - `GEMINI_API_KEY` = your Gemini API key *(at least one provider required)*

4. The Space will build and deploy automatically. Your proxy URL will be:
   ```
   https://<YOUR_USER>-<SPACE_NAME>.hf.space
   ```

> **Note:** HF Spaces secrets become environment variables at runtime. The Dockerfile already defaults to port 7860 and runs as uid 1000 as required by the platform.

## Claude Code Client Configuration

### Option 1: Environment Variables

```bash
export ANTHROPIC_BASE_URL=https://your-server.example.com
export ANTHROPIC_AUTH_TOKEN=your-proxy-auth-token
claude
```

### Option 2: Persistent (settings.json)

```json
// ~/.claude/settings.json
{
  "env": {
    "ANTHROPIC_BASE_URL": "https://your-server.example.com",
    "ANTHROPIC_AUTH_TOKEN": "your-proxy-auth-token"
  }
}
```

### Option 3: Managed Settings (Enterprise)

```json
// macOS: /Library/Application Support/ClaudeCode/managed-settings.json
// Linux: /etc/claude-code/managed-settings.json
{
  "env": {
    "ANTHROPIC_BASE_URL": "https://your-server.example.com"
  }
}
```

## Gemini CLI Client Configuration

Configure Gemini CLI to use the proxy by setting the base URL and API key:

```bash
export GOOGLE_GEMINI_BASE_URL=https://your-server.example.com
export GEMINI_API_KEY=your-proxy-auth-token
gemini
```

> **Note:** Use the same `PROXY_AUTH_TOKEN` value as `GEMINI_API_KEY` on the client side. The proxy accepts it via the `x-goog-api-key` header, validates it as the proxy auth token, and replaces it with the real Gemini API key before forwarding upstream.

**Important:** Authenticate Gemini CLI via API key (not Google login). If you have a cached Google session, run `gemini --clear-credentials` first, otherwise the CLI may ignore the base URL override.

### Test the Connection

```bash
# Health check
curl https://your-server.example.com/health

# Test Anthropic relay
curl -X POST https://your-server.example.com/v1/messages \
  -H "Authorization: Bearer your-proxy-auth-token" \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-20250514",
    "max_tokens": 100,
    "messages": [{"role": "user", "content": "Hi"}]
  }'

# Test Gemini relay
curl -X POST https://your-server.example.com/v1beta/models/gemini-2.0-flash:generateContent \
  -H "Authorization: Bearer your-proxy-auth-token" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{"parts": [{"text": "Hi"}]}]
  }'
```

## Tech Stack

- **Runtime:** Node.js >= 20
- **Framework:** [Fastify](https://fastify.dev/) 5
- **HTTP Client:** [undici](https://undici.nodejs.org/)
- **Language:** TypeScript (strict mode)

## License

MIT