Spaces:

relfa
/

llm-proxy

Runtime error

App Files Files Community

llm-proxy / README.md

relfa

feat: Add Gemini API support and refactor proxy logic for multi-provider extensibility.

3784bc3 about 1 month ago

preview code

raw

history blame contribute delete

6.59 kB

metadata

title: LLM Proxy
emoji: 🔀
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false

AI Proxy – LLM Gateway

A transparent relay proxy that forwards API requests from Claude Code and Gemini CLI through environments where upstream APIs are blocked (e.g. corporate firewalls). The proxy performs auth-swap, header hygiene, and supports SSE streaming.

Claude Code  →  AI Proxy  →  api.anthropic.com
Gemini CLI   →  AI Proxy  →  generativelanguage.googleapis.com

Features

Multi-provider relay – Anthropic and Google Gemini via a single proxy
1:1 transparent relay – no request/response body modification
SSE streaming – chunk-by-chunk forwarding, zero buffering
Auth swap – client authenticates with a shared token; server injects the real API key
Header hygiene – strips hop-by-hop headers, authorization, and client-sent API keys
Rate limiting – per-IP, configurable window and max
Defensive headers – X-Content-Type-Options: nosniff, X-Frame-Options: DENY
Graceful shutdown – finishes in-flight streams before exiting
Hugging Face Spaces ready – Docker configuration pre-set for HF Spaces deployment

Quick Start (Local)

# 1. Clone & install
git clone <your-repo-url> && cd ai-proxy
npm install

# 2. Configure – copy and edit
cp .env.example .env
# Set PROXY_AUTH_TOKEN and at least one provider key
# (ANTHROPIC_API_KEY and/or GEMINI_API_KEY)

# 3. Run
npm run dev

Health check: curl http://localhost:7860/health

Environment Variables

Variable	Required	Default	Description
`PROXY_AUTH_TOKEN`	Yes	–	Shared secret for client authentication
`ANTHROPIC_API_KEY`	–	–	Anthropic API key (enables Anthropic relay when set)
`PORT`	–	`7860`	Server port
`HOST`	–	`0.0.0.0`	Server bind address
`LOG_LEVEL`	–	`info`	`trace` \| `debug` \| `info` \| `warn` \| `error`
`RATE_LIMIT_MAX`	–	`100`	Requests per time window per IP
`RATE_LIMIT_WINDOW_MS`	–	`60000`	Rate limit window (ms)
`BODY_LIMIT`	–	`5242880`	Max request body size (bytes, 5 MB)
`CORS_ORIGIN`	–	(disabled)	CORS origin (e.g. `*` or `https://example.com`)
`ANTHROPIC_BASE_URL`	–	`https://api.anthropic.com`	Upstream Anthropic URL
`UPSTREAM_TIMEOUT_MS`	–	`300000`	Upstream request timeout (5 min)
`GEMINI_API_KEY`	–	–	Gemini API key (enables Gemini relay when set)
`GEMINI_BASE_URL`	–	`https://generativelanguage.googleapis.com`	Upstream Gemini URL

API Endpoints

Method	Path	Auth	Description
`GET`	`/health`	No	Health check → `{"status":"ok"}`
`POST`	`/v1/messages`	Yes	Anthropic chat completions (relayed 1:1)
`POST`	`/v1/messages/count_tokens`	Yes	Anthropic token counting (relayed 1:1)
`POST`	`/v1beta/models/{model}:generateContent`	Yes	Gemini content generation (relayed 1:1)
`POST`	`/v1beta/models/{model}:streamGenerateContent`	Yes	Gemini streaming generation (relayed 1:1)

All other routes return 404. Non-POST methods on API routes return 405.

Docker

Local (docker compose)

cp .env.example .env
# Edit .env with your keys
docker compose up --build

Hugging Face Spaces

Create a new Space on huggingface.co/new-space:
- SDK: Docker
- Visibility: Private (recommended – this handles API keys)

Push this repository to the Space:

git remote add hf https://huggingface.co/spaces/<YOUR_USER>/<SPACE_NAME>
git push hf main

Configure Secrets in Space Settings → Repository secrets:
- PROXY_AUTH_TOKEN = your chosen shared secret
- ANTHROPIC_API_KEY = your Anthropic key (at least one provider required)
- GEMINI_API_KEY = your Gemini API key (at least one provider required)
The Space will build and deploy automatically. Your proxy URL will be:
```
https://<YOUR_USER>-<SPACE_NAME>.hf.space
```

Note: HF Spaces secrets become environment variables at runtime. The Dockerfile already defaults to port 7860 and runs as uid 1000 as required by the platform.

Claude Code Client Configuration

Option 1: Environment Variables

export ANTHROPIC_BASE_URL=https://your-server.example.com
export ANTHROPIC_AUTH_TOKEN=your-proxy-auth-token
claude

Option 2: Persistent (settings.json)

// ~/.claude/settings.json
{
  "env": {
    "ANTHROPIC_BASE_URL": "https://your-server.example.com",
    "ANTHROPIC_AUTH_TOKEN": "your-proxy-auth-token"
  }
}

Option 3: Managed Settings (Enterprise)

// macOS: /Library/Application Support/ClaudeCode/managed-settings.json
// Linux: /etc/claude-code/managed-settings.json
{
  "env": {
    "ANTHROPIC_BASE_URL": "https://your-server.example.com"
  }
}

Gemini CLI Client Configuration

Configure Gemini CLI to use the proxy by setting the base URL and API key:

export GOOGLE_GEMINI_BASE_URL=https://your-server.example.com
export GEMINI_API_KEY=your-proxy-auth-token
gemini

Note: Use the same PROXY_AUTH_TOKEN value as GEMINI_API_KEY on the client side. The proxy accepts it via the x-goog-api-key header, validates it as the proxy auth token, and replaces it with the real Gemini API key before forwarding upstream.

Important: Authenticate Gemini CLI via API key (not Google login). If you have a cached Google session, run gemini --clear-credentials first, otherwise the CLI may ignore the base URL override.

Test the Connection

# Health check
curl https://your-server.example.com/health

# Test Anthropic relay
curl -X POST https://your-server.example.com/v1/messages \
  -H "Authorization: Bearer your-proxy-auth-token" \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-20250514",
    "max_tokens": 100,
    "messages": [{"role": "user", "content": "Hi"}]
  }'

# Test Gemini relay
curl -X POST https://your-server.example.com/v1beta/models/gemini-2.0-flash:generateContent \
  -H "Authorization: Bearer your-proxy-auth-token" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{"parts": [{"text": "Hi"}]}]
  }'

Tech Stack

Runtime: Node.js >= 20
Framework: Fastify 5
HTTP Client: undici
Language: TypeScript (strict mode)

License

MIT