Spaces:
Sleeping
Sleeping
Upload folder using huggingface_hub
Browse files- README.md +14 -32
- docker-compose.yaml +2 -1
- litellm-config-auto.yaml +24 -32
README.md
CHANGED
|
@@ -10,18 +10,15 @@ pinned: false
|
|
| 10 |
|
| 11 |
# LiteLLM Proxy (Render & Hugging Face Space)
|
| 12 |
|
| 13 |
-
LiteLLM proxy for OpenRouter, Hugging Face, and other providers, deployable to Render, Hugging Face Spaces (Docker), and runnable locally with Docker.
|
| 14 |
|
| 15 |
## Fix: "Authentication Error, No api key passed in" (401)
|
| 16 |
|
| 17 |
-
This happens when the proxy is configured with a **master key** but either
|
| 18 |
-
|
| 19 |
-
1. **Render:** Required environment variables are missing or wrong, or
|
| 20 |
-
2. **Client:** The request does not send the API key in the header.
|
| 21 |
|
| 22 |
### 1. Set environment variables on Render
|
| 23 |
|
| 24 |
-
The config reads
|
| 25 |
|
| 26 |
1. In [Render Dashboard](https://dashboard.render.com) → your Web Service → **Environment**.
|
| 27 |
2. Under **Environment Variables**, click **+ Add Environment Variable**.
|
|
@@ -31,6 +28,7 @@ The config reads **from environment variables** (`os.environ/LITELLM_MASTER_KEY`
|
|
| 31 |
|-----|-------|--------|
|
| 32 |
| `LITELLM_MASTER_KEY` | e.g. `sk-your-secret-key` | Mark as **Secret**. This is the key clients send. |
|
| 33 |
| `OPENROUTER_API_KEY` | Your OpenRouter API key | Mark as **Secret**. |
|
|
|
|
| 34 |
| `HF_TOKEN` | Your Hugging Face token (`hf_...`) | Optional. Mark as **Secret**. Required only for Hugging Face models (`my-hf-models`). Create at [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens). |
|
| 35 |
| `PORT` | (optional) | Render sets this automatically; no need to add. |
|
| 36 |
|
|
@@ -38,44 +36,28 @@ The config reads **from environment variables** (`os.environ/LITELLM_MASTER_KEY`
|
|
| 38 |
|
| 39 |
### 2. Send the API key from your client
|
| 40 |
|
| 41 |
-
Every request to the proxy must include
|
| 42 |
-
|
| 43 |
-
```http
|
| 44 |
-
Authorization: Bearer <your-LITELLM_MASTER_KEY>
|
| 45 |
-
```
|
| 46 |
|
| 47 |
-
- **Cursor:** Settings → Models
|
| 48 |
-
- **
|
| 49 |
-
```bash
|
| 50 |
-
curl -X POST https://your-app.onrender.com/v1/chat/completions \
|
| 51 |
-
-H "Authorization: Bearer sk-your-master-key" \
|
| 52 |
-
-H "Content-Type: application/json" \
|
| 53 |
-
-d '{"model": "my-free-models", "messages": [{"role": "user", "content": "Hi"}]}'
|
| 54 |
-
```
|
| 55 |
-
- **OpenAI SDK / other clients:** Set the API key to your `LITELLM_MASTER_KEY` when using the proxy URL.
|
| 56 |
|
| 57 |
-
If the env vars are set on
|
| 58 |
|
| 59 |
## Local run (Docker Compose)
|
| 60 |
|
| 61 |
-
```
|
| 62 |
-
cp .env.example .env # if you have one, or create .env with OPENROUTER_API_KEY and LITELLM_MASTER_KEY
|
| 63 |
-
docker compose up --build
|
| 64 |
-
```
|
| 65 |
-
|
| 66 |
-
Port is controlled by `PORT` in `.env` (default 4000).
|
| 67 |
|
| 68 |
## Deploy to Render
|
| 69 |
|
| 70 |
1. Connect this repo to Render and create a **Web Service**.
|
| 71 |
-
2. Render will use the **Dockerfile**; no build
|
| 72 |
-
3. Add **Environment Variables** (see above): `LITELLM_MASTER_KEY` and `OPENROUTER_API_KEY` (mark as Secret).
|
| 73 |
-
4. Deploy. Use the service URL
|
| 74 |
|
| 75 |
## Deploy to Hugging Face Spaces (Docker)
|
| 76 |
|
| 77 |
The README includes Spaces frontmatter (`sdk: docker`, `app_port: 7860`). To run this repo as a Space:
|
| 78 |
|
| 79 |
1. Create a new Space at [huggingface.co/new-space](https://huggingface.co/new-space), choose **Docker**, and use this repo (or push this repo to the Space).
|
| 80 |
-
2. In the Space **Settings** → **Variables and secrets**, add **Secrets**: `LITELLM_MASTER_KEY`, `OPENROUTER_API_KEY`
|
| 81 |
-
3. The Space will build from the **Dockerfile** and run the proxy. Use the Space URL
|
|
|
|
| 10 |
|
| 11 |
# LiteLLM Proxy (Render & Hugging Face Space)
|
| 12 |
|
| 13 |
+
LiteLLM proxy for OpenRouter, Google (Gemini), Hugging Face, and other providers, deployable to Render, Hugging Face Spaces (Docker), and runnable locally with Docker.
|
| 14 |
|
| 15 |
## Fix: "Authentication Error, No api key passed in" (401)
|
| 16 |
|
| 17 |
+
This happens when the proxy is configured with a **master key** but either required environment variables are missing or wrong on the server, or the client does not send the API key in the request header.
|
|
|
|
|
|
|
|
|
|
| 18 |
|
| 19 |
### 1. Set environment variables on Render
|
| 20 |
|
| 21 |
+
The config reads from environment variables. No secret file is needed.
|
| 22 |
|
| 23 |
1. In [Render Dashboard](https://dashboard.render.com) → your Web Service → **Environment**.
|
| 24 |
2. Under **Environment Variables**, click **+ Add Environment Variable**.
|
|
|
|
| 28 |
|-----|-------|--------|
|
| 29 |
| `LITELLM_MASTER_KEY` | e.g. `sk-your-secret-key` | Mark as **Secret**. This is the key clients send. |
|
| 30 |
| `OPENROUTER_API_KEY` | Your OpenRouter API key | Mark as **Secret**. |
|
| 31 |
+
| `GOOGLE_API_KEY` | Your Google AI API key | Mark as **Secret**. Required for Gemini models (`my-free-coders-new` with Gemini). |
|
| 32 |
| `HF_TOKEN` | Your Hugging Face token (`hf_...`) | Optional. Mark as **Secret**. Required only for Hugging Face models (`my-hf-models`). Create at [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens). |
|
| 33 |
| `PORT` | (optional) | Render sets this automatically; no need to add. |
|
| 34 |
|
|
|
|
| 36 |
|
| 37 |
### 2. Send the API key from your client
|
| 38 |
|
| 39 |
+
Every request to the proxy must include the header **Authorization** with value **Bearer** followed by your `LITELLM_MASTER_KEY` value.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 40 |
|
| 41 |
+
- **Cursor:** In Settings → Models, set your Render or Space URL as the base URL and set the API Key to the same value as `LITELLM_MASTER_KEY`.
|
| 42 |
+
- **OpenAI-compatible clients:** Configure the base URL to your proxy URL and set the API key to your `LITELLM_MASTER_KEY`.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43 |
|
| 44 |
+
If the env vars are set on the server and you send the master key in the Authorization header, the 401 should go away.
|
| 45 |
|
| 46 |
## Local run (Docker Compose)
|
| 47 |
|
| 48 |
+
Create a `.env` file in the project root with `OPENROUTER_API_KEY`, `LITELLM_MASTER_KEY`, and optionally `GOOGLE_API_KEY` and `HF_TOKEN`. Then run the app with Docker Compose (build and start the service). Port is controlled by `PORT` in `.env`; default is 4000.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 49 |
|
| 50 |
## Deploy to Render
|
| 51 |
|
| 52 |
1. Connect this repo to Render and create a **Web Service**.
|
| 53 |
+
2. Render will use the **Dockerfile**; no build or start command is required.
|
| 54 |
+
3. Add **Environment Variables** (see above): at least `LITELLM_MASTER_KEY` and `OPENROUTER_API_KEY` (mark as Secret). Add `GOOGLE_API_KEY` if you use Gemini models.
|
| 55 |
+
4. Deploy. Use the service URL with the same master key in the Authorization header from your app or Cursor.
|
| 56 |
|
| 57 |
## Deploy to Hugging Face Spaces (Docker)
|
| 58 |
|
| 59 |
The README includes Spaces frontmatter (`sdk: docker`, `app_port: 7860`). To run this repo as a Space:
|
| 60 |
|
| 61 |
1. Create a new Space at [huggingface.co/new-space](https://huggingface.co/new-space), choose **Docker**, and use this repo (or push this repo to the Space).
|
| 62 |
+
2. In the Space **Settings** → **Variables and secrets**, add **Secrets**: `LITELLM_MASTER_KEY`, `OPENROUTER_API_KEY`, and optionally `GOOGLE_API_KEY` and `HF_TOKEN`. If the Space does not set `PORT` for you, add a **Variable** `PORT` = `7860` so the proxy listens on the expected port.
|
| 63 |
+
3. The Space will build from the **Dockerfile** and run the proxy. Use the Space URL with **Authorization: Bearer** and your `LITELLM_MASTER_KEY`.
|
docker-compose.yaml
CHANGED
|
@@ -5,10 +5,11 @@ services:
|
|
| 5 |
ports:
|
| 6 |
- "${PORT:-4000}:${PORT:-4000}"
|
| 7 |
volumes:
|
| 8 |
-
- ./litellm-config-auto.yaml:/app/config.yaml
|
| 9 |
environment:
|
| 10 |
- OPENROUTER_API_KEY=${OPENROUTER_API_KEY}
|
| 11 |
- LITELLM_MASTER_KEY=${LITELLM_MASTER_KEY:-sk-1234}
|
|
|
|
| 12 |
- HF_TOKEN=${HF_TOKEN}
|
| 13 |
- PORT=${PORT:-4000}
|
| 14 |
restart: always
|
|
|
|
| 5 |
ports:
|
| 6 |
- "${PORT:-4000}:${PORT:-4000}"
|
| 7 |
volumes:
|
| 8 |
+
- ./litellm-config-auto.yaml:/home/user/app/config.yaml
|
| 9 |
environment:
|
| 10 |
- OPENROUTER_API_KEY=${OPENROUTER_API_KEY}
|
| 11 |
- LITELLM_MASTER_KEY=${LITELLM_MASTER_KEY:-sk-1234}
|
| 12 |
+
- GOOGLE_API_KEY=${GOOGLE_API_KEY}
|
| 13 |
- HF_TOKEN=${HF_TOKEN}
|
| 14 |
- PORT=${PORT:-4000}
|
| 15 |
restart: always
|
litellm-config-auto.yaml
CHANGED
|
@@ -5,26 +5,7 @@ litellm_settings:
|
|
| 5 |
drop_params: False
|
| 6 |
modify_params: True
|
| 7 |
set_verbose: False
|
| 8 |
-
num_retries: 5
|
| 9 |
-
# These belong here (global request defaults)
|
| 10 |
request_timeout: 30
|
| 11 |
-
allowed_fails: 1
|
| 12 |
-
cooldown_time: 60
|
| 13 |
-
default_completion_params:
|
| 14 |
-
max_tokens: 4096
|
| 15 |
-
trim_ratio: 0.75
|
| 16 |
-
extra_body:
|
| 17 |
-
transforms: ["middle-out"]
|
| 18 |
-
|
| 19 |
-
router_settings:
|
| 20 |
-
# Move these here for them to actually work!
|
| 21 |
-
routing_strategy: latency-based-routing
|
| 22 |
-
num_retries: 3 # Increase to 3 for better resilience
|
| 23 |
-
allowed_fails: 2
|
| 24 |
-
cooldown_time: 30
|
| 25 |
-
fallbacks: [{"my-free-coders-new": ["my-free-coders-new"]}]
|
| 26 |
-
context_window_fallbacks: [{"my-free-coders-new": ["my-free-coders-new"]}]
|
| 27 |
-
|
| 28 |
|
| 29 |
model_list:
|
| 30 |
- model_name: my-free-models
|
|
@@ -107,18 +88,6 @@ model_list:
|
|
| 107 |
litellm_params:
|
| 108 |
model: openrouter/arcee-ai/trinity-large-preview:free
|
| 109 |
api_key: "os.environ/OPENROUTER_API_KEY"
|
| 110 |
-
# - model_name: my-free-coders-new
|
| 111 |
-
# litellm_params:
|
| 112 |
-
# model: openrouter/google/gemini-2.0-flash-exp:free # Most stable
|
| 113 |
-
# api_key: "os.environ/OPENROUTER_API_KEY"
|
| 114 |
-
# - model_name: my-free-coders-new
|
| 115 |
-
# litellm_params:
|
| 116 |
-
# model: openrouter/meta-llama/llama-3.1-8b-instruct:free # Fallback 1
|
| 117 |
-
# api_key: "os.environ/OPENROUTER_API_KEY"
|
| 118 |
-
# - model_name: my-free-coders-new
|
| 119 |
-
# litellm_params:
|
| 120 |
-
# model: openrouter/qwen/qwen-2.5-72b-instruct:free # Fallback 2
|
| 121 |
-
# api_key: "os.environ/OPENROUTER_API_KEY"
|
| 122 |
- model_name: my-paid-coders
|
| 123 |
litellm_params:
|
| 124 |
model: openrouter/openai/gpt-oss-20b
|
|
@@ -143,7 +112,6 @@ model_list:
|
|
| 143 |
litellm_params:
|
| 144 |
model: openrouter/meta-llama/llama-3-8b-instruct
|
| 145 |
api_key: "os.environ/OPENROUTER_API_KEY"
|
| 146 |
-
# Hugging Face (set HF_TOKEN in env); format: huggingface/<provider>/<org>/<model>
|
| 147 |
- model_name: my-hf-models
|
| 148 |
litellm_params:
|
| 149 |
model: huggingface/meta-llama/Llama-3.3-70B-Instruct
|
|
@@ -155,3 +123,27 @@ model_list:
|
|
| 155 |
|
| 156 |
router_settings:
|
| 157 |
routing_strategy: latency-based-routing
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
drop_params: False
|
| 6 |
modify_params: True
|
| 7 |
set_verbose: False
|
|
|
|
|
|
|
| 8 |
request_timeout: 30
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
|
| 10 |
model_list:
|
| 11 |
- model_name: my-free-models
|
|
|
|
| 88 |
litellm_params:
|
| 89 |
model: openrouter/arcee-ai/trinity-large-preview:free
|
| 90 |
api_key: "os.environ/OPENROUTER_API_KEY"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 91 |
- model_name: my-paid-coders
|
| 92 |
litellm_params:
|
| 93 |
model: openrouter/openai/gpt-oss-20b
|
|
|
|
| 112 |
litellm_params:
|
| 113 |
model: openrouter/meta-llama/llama-3-8b-instruct
|
| 114 |
api_key: "os.environ/OPENROUTER_API_KEY"
|
|
|
|
| 115 |
- model_name: my-hf-models
|
| 116 |
litellm_params:
|
| 117 |
model: huggingface/meta-llama/Llama-3.3-70B-Instruct
|
|
|
|
| 123 |
|
| 124 |
router_settings:
|
| 125 |
routing_strategy: latency-based-routing
|
| 126 |
+
num_retries: 3
|
| 127 |
+
allowed_fails: 2
|
| 128 |
+
cooldown_time: 30
|
| 129 |
+
retry_policy:
|
| 130 |
+
AuthenticationErrorRetries: 3
|
| 131 |
+
TimeoutErrorRetries: 3
|
| 132 |
+
RateLimitErrorRetries: 3
|
| 133 |
+
ContentPolicyViolationErrorRetries: 4
|
| 134 |
+
InternalServerErrorRetries: 4
|
| 135 |
+
allowed_fails_policy:
|
| 136 |
+
BadRequestErrorAllowedFails: 1000
|
| 137 |
+
AuthenticationErrorAllowedFails: 10
|
| 138 |
+
TimeoutErrorAllowedFails: 12
|
| 139 |
+
RateLimitErrorAllowedFails: 10000
|
| 140 |
+
ContentPolicyViolationErrorAllowedFails: 15
|
| 141 |
+
InternalServerErrorAllowedFails: 20
|
| 142 |
+
fallbacks:
|
| 143 |
+
- my-free-coders-new:
|
| 144 |
+
- my-free-models
|
| 145 |
+
context_window_fallbacks:
|
| 146 |
+
- my-free-coders-new:
|
| 147 |
+
- my-free-models
|
| 148 |
+
default_litellm_params:
|
| 149 |
+
max_tokens: 4096
|