llm-proxy / README.md
relfa's picture
feat: Add Gemini API support and refactor proxy logic for multi-provider extensibility.
3784bc3
metadata
title: LLM Proxy
emoji: 🔀
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false

AI Proxy – LLM Gateway

A transparent relay proxy that forwards API requests from Claude Code and Gemini CLI through environments where upstream APIs are blocked (e.g. corporate firewalls). The proxy performs auth-swap, header hygiene, and supports SSE streaming.

Claude Code  →  AI Proxy  →  api.anthropic.com
Gemini CLI   →  AI Proxy  →  generativelanguage.googleapis.com

Features

  • Multi-provider relay – Anthropic and Google Gemini via a single proxy
  • 1:1 transparent relay – no request/response body modification
  • SSE streaming – chunk-by-chunk forwarding, zero buffering
  • Auth swap – client authenticates with a shared token; server injects the real API key
  • Header hygiene – strips hop-by-hop headers, authorization, and client-sent API keys
  • Rate limiting – per-IP, configurable window and max
  • Defensive headersX-Content-Type-Options: nosniff, X-Frame-Options: DENY
  • Graceful shutdown – finishes in-flight streams before exiting
  • Hugging Face Spaces ready – Docker configuration pre-set for HF Spaces deployment

Quick Start (Local)

# 1. Clone & install
git clone <your-repo-url> && cd ai-proxy
npm install

# 2. Configure – copy and edit
cp .env.example .env
# Set PROXY_AUTH_TOKEN and at least one provider key
# (ANTHROPIC_API_KEY and/or GEMINI_API_KEY)

# 3. Run
npm run dev

Health check: curl http://localhost:7860/health

Environment Variables

Variable Required Default Description
PROXY_AUTH_TOKEN Yes Shared secret for client authentication
ANTHROPIC_API_KEY Anthropic API key (enables Anthropic relay when set)
PORT 7860 Server port
HOST 0.0.0.0 Server bind address
LOG_LEVEL info trace | debug | info | warn | error
RATE_LIMIT_MAX 100 Requests per time window per IP
RATE_LIMIT_WINDOW_MS 60000 Rate limit window (ms)
BODY_LIMIT 5242880 Max request body size (bytes, 5 MB)
CORS_ORIGIN (disabled) CORS origin (e.g. * or https://example.com)
ANTHROPIC_BASE_URL https://api.anthropic.com Upstream Anthropic URL
UPSTREAM_TIMEOUT_MS 300000 Upstream request timeout (5 min)
GEMINI_API_KEY Gemini API key (enables Gemini relay when set)
GEMINI_BASE_URL https://generativelanguage.googleapis.com Upstream Gemini URL

API Endpoints

Method Path Auth Description
GET /health No Health check → {"status":"ok"}
POST /v1/messages Yes Anthropic chat completions (relayed 1:1)
POST /v1/messages/count_tokens Yes Anthropic token counting (relayed 1:1)
POST /v1beta/models/{model}:generateContent Yes Gemini content generation (relayed 1:1)
POST /v1beta/models/{model}:streamGenerateContent Yes Gemini streaming generation (relayed 1:1)

All other routes return 404. Non-POST methods on API routes return 405.

Docker

Local (docker compose)

cp .env.example .env
# Edit .env with your keys
docker compose up --build

Hugging Face Spaces

  1. Create a new Space on huggingface.co/new-space:

    • SDK: Docker
    • Visibility: Private (recommended – this handles API keys)
  2. Push this repository to the Space:

    git remote add hf https://huggingface.co/spaces/<YOUR_USER>/<SPACE_NAME>
    git push hf main
    
  3. Configure Secrets in Space Settings → Repository secrets:

    • PROXY_AUTH_TOKEN = your chosen shared secret
    • ANTHROPIC_API_KEY = your Anthropic key (at least one provider required)
    • GEMINI_API_KEY = your Gemini API key (at least one provider required)
  4. The Space will build and deploy automatically. Your proxy URL will be:

    https://<YOUR_USER>-<SPACE_NAME>.hf.space
    

Note: HF Spaces secrets become environment variables at runtime. The Dockerfile already defaults to port 7860 and runs as uid 1000 as required by the platform.

Claude Code Client Configuration

Option 1: Environment Variables

export ANTHROPIC_BASE_URL=https://your-server.example.com
export ANTHROPIC_AUTH_TOKEN=your-proxy-auth-token
claude

Option 2: Persistent (settings.json)

// ~/.claude/settings.json
{
  "env": {
    "ANTHROPIC_BASE_URL": "https://your-server.example.com",
    "ANTHROPIC_AUTH_TOKEN": "your-proxy-auth-token"
  }
}

Option 3: Managed Settings (Enterprise)

// macOS: /Library/Application Support/ClaudeCode/managed-settings.json
// Linux: /etc/claude-code/managed-settings.json
{
  "env": {
    "ANTHROPIC_BASE_URL": "https://your-server.example.com"
  }
}

Gemini CLI Client Configuration

Configure Gemini CLI to use the proxy by setting the base URL and API key:

export GOOGLE_GEMINI_BASE_URL=https://your-server.example.com
export GEMINI_API_KEY=your-proxy-auth-token
gemini

Note: Use the same PROXY_AUTH_TOKEN value as GEMINI_API_KEY on the client side. The proxy accepts it via the x-goog-api-key header, validates it as the proxy auth token, and replaces it with the real Gemini API key before forwarding upstream.

Important: Authenticate Gemini CLI via API key (not Google login). If you have a cached Google session, run gemini --clear-credentials first, otherwise the CLI may ignore the base URL override.

Test the Connection

# Health check
curl https://your-server.example.com/health

# Test Anthropic relay
curl -X POST https://your-server.example.com/v1/messages \
  -H "Authorization: Bearer your-proxy-auth-token" \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-20250514",
    "max_tokens": 100,
    "messages": [{"role": "user", "content": "Hi"}]
  }'

# Test Gemini relay
curl -X POST https://your-server.example.com/v1beta/models/gemini-2.0-flash:generateContent \
  -H "Authorization: Bearer your-proxy-auth-token" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{"parts": [{"text": "Hi"}]}]
  }'

Tech Stack

  • Runtime: Node.js >= 20
  • Framework: Fastify 5
  • HTTP Client: undici
  • Language: TypeScript (strict mode)

License

MIT