Spaces:

raheem786
/

litellm-proxy

Sleeping

App Files Files Community

raheem786 commited on Feb 4

Commit

25e7393

verified ·

1 Parent(s): d4b1d2b

Upload folder using huggingface_hub

Browse files

Files changed (3) hide show

README.md +14 -32
docker-compose.yaml +2 -1
litellm-config-auto.yaml +24 -32

README.md CHANGED Viewed

@@ -10,18 +10,15 @@ pinned: false
 # LiteLLM Proxy (Render & Hugging Face Space)
-LiteLLM proxy for OpenRouter, Hugging Face, and other providers, deployable to Render, Hugging Face Spaces (Docker), and runnable locally with Docker.
 ## Fix: "Authentication Error, No api key passed in" (401)
-This happens when the proxy is configured with a **master key** but either:
-1. **Render:** Required environment variables are missing or wrong, or
-2. **Client:** The request does not send the API key in the header.
 ### 1. Set environment variables on Render
-The config reads **from environment variables** (`os.environ/LITELLM_MASTER_KEY`, `os.environ/OPENROUTER_API_KEY`). No secret file needed.
 1. In [Render Dashboard](https://dashboard.render.com) → your Web Service → **Environment**.
 2. Under **Environment Variables**, click **+ Add Environment Variable**.
@@ -31,6 +28,7 @@ The config reads **from environment variables** (`os.environ/LITELLM_MASTER_KEY`
    |-----|-------|--------|
    | `LITELLM_MASTER_KEY` | e.g. `sk-your-secret-key` | Mark as **Secret**. This is the key clients send. |
    | `OPENROUTER_API_KEY` | Your OpenRouter API key | Mark as **Secret**. |
    | `HF_TOKEN` | Your Hugging Face token (`hf_...`) | Optional. Mark as **Secret**. Required only for Hugging Face models (`my-hf-models`). Create at [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens). |
    | `PORT` | (optional) | Render sets this automatically; no need to add. |
@@ -38,44 +36,28 @@ The config reads **from environment variables** (`os.environ/LITELLM_MASTER_KEY`
 ### 2. Send the API key from your client
-Every request to the proxy must include:
-```http
-Authorization: Bearer <your-LITELLM_MASTER_KEY>
-```
-- **Cursor:** Settings → Models → Add your Render URL as base URL and set **API Key** to the same value as `LITELLM_MASTER_KEY`.
-- **curl:**
-  ```bash
-  curl -X POST https://your-app.onrender.com/v1/chat/completions \
-    -H "Authorization: Bearer sk-your-master-key" \
-    -H "Content-Type: application/json" \
-    -d '{"model": "my-free-models", "messages": [{"role": "user", "content": "Hi"}]}'
-  ```
-- **OpenAI SDK / other clients:** Set the API key to your `LITELLM_MASTER_KEY` when using the proxy URL.
-If the env vars are set on Render and you send the master key in `Authorization: Bearer ...`, the 401 should go away.
 ## Local run (Docker Compose)
-```bash
-cp .env.example .env   # if you have one, or create .env with OPENROUTER_API_KEY and LITELLM_MASTER_KEY
-docker compose up --build
-```
-Port is controlled by `PORT` in `.env` (default 4000).
 ## Deploy to Render
 1. Connect this repo to Render and create a **Web Service**.
-2. Render will use the **Dockerfile**; no build/start command needed.
-3. Add **Environment Variables** (see above): `LITELLM_MASTER_KEY` and `OPENROUTER_API_KEY` (mark as Secret).
-4. Deploy. Use the service URL and the same master key in `Authorization: Bearer ...` from your app or Cursor.
 ## Deploy to Hugging Face Spaces (Docker)
 The README includes Spaces frontmatter (`sdk: docker`, `app_port: 7860`). To run this repo as a Space:
 1. Create a new Space at [huggingface.co/new-space](https://huggingface.co/new-space), choose **Docker**, and use this repo (or push this repo to the Space).
-2. In the Space **Settings** → **Variables and secrets**, add **Secrets**: `LITELLM_MASTER_KEY`, `OPENROUTER_API_KEY` (and optionally `HF_TOKEN` for Hugging Face models). Add a **Variable** `PORT` = `7860` so the proxy listens on the Space’s expected port.
-3. The Space will build from the **Dockerfile** and run the proxy. Use the Space URL (e.g. `https://your-username-litellm-proxy.hf.space`) with `Authorization: Bearer <LITELLM_MASTER_KEY>`.

 # LiteLLM Proxy (Render & Hugging Face Space)
+LiteLLM proxy for OpenRouter, Google (Gemini), Hugging Face, and other providers, deployable to Render, Hugging Face Spaces (Docker), and runnable locally with Docker.
 ## Fix: "Authentication Error, No api key passed in" (401)
+This happens when the proxy is configured with a **master key** but either required environment variables are missing or wrong on the server, or the client does not send the API key in the request header.
 ### 1. Set environment variables on Render
+The config reads from environment variables. No secret file is needed.
 1. In [Render Dashboard](https://dashboard.render.com) → your Web Service → **Environment**.
 2. Under **Environment Variables**, click **+ Add Environment Variable**.
    |-----|-------|--------|
    | `LITELLM_MASTER_KEY` | e.g. `sk-your-secret-key` | Mark as **Secret**. This is the key clients send. |
    | `OPENROUTER_API_KEY` | Your OpenRouter API key | Mark as **Secret**. |
+   | `GOOGLE_API_KEY` | Your Google AI API key | Mark as **Secret**. Required for Gemini models (`my-free-coders-new` with Gemini). |
    | `HF_TOKEN` | Your Hugging Face token (`hf_...`) | Optional. Mark as **Secret**. Required only for Hugging Face models (`my-hf-models`). Create at [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens). |
    | `PORT` | (optional) | Render sets this automatically; no need to add. |
 ### 2. Send the API key from your client
+Every request to the proxy must include the header **Authorization** with value **Bearer** followed by your `LITELLM_MASTER_KEY` value.
+- **Cursor:** In Settings → Models, set your Render or Space URL as the base URL and set the API Key to the same value as `LITELLM_MASTER_KEY`.
+- **OpenAI-compatible clients:** Configure the base URL to your proxy URL and set the API key to your `LITELLM_MASTER_KEY`.
+If the env vars are set on the server and you send the master key in the Authorization header, the 401 should go away.
 ## Local run (Docker Compose)
+Create a `.env` file in the project root with `OPENROUTER_API_KEY`, `LITELLM_MASTER_KEY`, and optionally `GOOGLE_API_KEY` and `HF_TOKEN`. Then run the app with Docker Compose (build and start the service). Port is controlled by `PORT` in `.env`; default is 4000.
 ## Deploy to Render
 1. Connect this repo to Render and create a **Web Service**.
+2. Render will use the **Dockerfile**; no build or start command is required.
+3. Add **Environment Variables** (see above): at least `LITELLM_MASTER_KEY` and `OPENROUTER_API_KEY` (mark as Secret). Add `GOOGLE_API_KEY` if you use Gemini models.
+4. Deploy. Use the service URL with the same master key in the Authorization header from your app or Cursor.
 ## Deploy to Hugging Face Spaces (Docker)
 The README includes Spaces frontmatter (`sdk: docker`, `app_port: 7860`). To run this repo as a Space:
 1. Create a new Space at [huggingface.co/new-space](https://huggingface.co/new-space), choose **Docker**, and use this repo (or push this repo to the Space).
+2. In the Space **Settings** → **Variables and secrets**, add **Secrets**: `LITELLM_MASTER_KEY`, `OPENROUTER_API_KEY`, and optionally `GOOGLE_API_KEY` and `HF_TOKEN`. If the Space does not set `PORT` for you, add a **Variable** `PORT` = `7860` so the proxy listens on the expected port.
+3. The Space will build from the **Dockerfile** and run the proxy. Use the Space URL with **Authorization: Bearer** and your `LITELLM_MASTER_KEY`.

docker-compose.yaml CHANGED Viewed

@@ -5,10 +5,11 @@ services:
     ports:
       - "${PORT:-4000}:${PORT:-4000}"
     volumes:
-      - ./litellm-config-auto.yaml:/app/config.yaml
     environment:
       - OPENROUTER_API_KEY=${OPENROUTER_API_KEY}
       - LITELLM_MASTER_KEY=${LITELLM_MASTER_KEY:-sk-1234}
       - HF_TOKEN=${HF_TOKEN}
       - PORT=${PORT:-4000}
     restart: always

     ports:
       - "${PORT:-4000}:${PORT:-4000}"
     volumes:
+      - ./litellm-config-auto.yaml:/home/user/app/config.yaml
     environment:
       - OPENROUTER_API_KEY=${OPENROUTER_API_KEY}
       - LITELLM_MASTER_KEY=${LITELLM_MASTER_KEY:-sk-1234}
+      - GOOGLE_API_KEY=${GOOGLE_API_KEY}
       - HF_TOKEN=${HF_TOKEN}
       - PORT=${PORT:-4000}
     restart: always

litellm-config-auto.yaml CHANGED Viewed

@@ -5,26 +5,7 @@ litellm_settings:
   drop_params: False
   modify_params: True
   set_verbose: False
-  num_retries: 5
-  # These belong here (global request defaults)
   request_timeout: 30
-  allowed_fails: 1
-  cooldown_time: 60
-  default_completion_params:
-      max_tokens: 4096
-      trim_ratio: 0.75
-      extra_body:
-        transforms: ["middle-out"]
-router_settings:
-  # Move these here for them to actually work!
-  routing_strategy: latency-based-routing
-  num_retries: 3           # Increase to 3 for better resilience
-  allowed_fails: 2
-  cooldown_time: 30
-  fallbacks: [{"my-free-coders-new": ["my-free-coders-new"]}]
-  context_window_fallbacks: [{"my-free-coders-new": ["my-free-coders-new"]}]
 model_list:
   - model_name: my-free-models
@@ -107,18 +88,6 @@ model_list:
     litellm_params:
       model: openrouter/arcee-ai/trinity-large-preview:free
       api_key: "os.environ/OPENROUTER_API_KEY"
-  # - model_name: my-free-coders-new
-  #   litellm_params:
-  #     model: openrouter/google/gemini-2.0-flash-exp:free # Most stable
-  #     api_key: "os.environ/OPENROUTER_API_KEY"
-  # - model_name: my-free-coders-new
-  #   litellm_params:
-  #     model: openrouter/meta-llama/llama-3.1-8b-instruct:free # Fallback 1
-  #     api_key: "os.environ/OPENROUTER_API_KEY"
-  # - model_name: my-free-coders-new
-    # litellm_params:
-    #   model: openrouter/qwen/qwen-2.5-72b-instruct:free # Fallback 2
-    #   api_key: "os.environ/OPENROUTER_API_KEY"
   - model_name: my-paid-coders
     litellm_params:
       model: openrouter/openai/gpt-oss-20b
@@ -143,7 +112,6 @@ model_list:
     litellm_params:
       model: openrouter/meta-llama/llama-3-8b-instruct
       api_key: "os.environ/OPENROUTER_API_KEY"
-  # Hugging Face (set HF_TOKEN in env); format: huggingface/<provider>/<org>/<model>
   - model_name: my-hf-models
     litellm_params:
       model: huggingface/meta-llama/Llama-3.3-70B-Instruct
@@ -155,3 +123,27 @@ model_list:
 router_settings:
   routing_strategy: latency-based-routing

   drop_params: False
   modify_params: True
   set_verbose: False
   request_timeout: 30
 model_list:
   - model_name: my-free-models
     litellm_params:
       model: openrouter/arcee-ai/trinity-large-preview:free
       api_key: "os.environ/OPENROUTER_API_KEY"
   - model_name: my-paid-coders
     litellm_params:
       model: openrouter/openai/gpt-oss-20b
     litellm_params:
       model: openrouter/meta-llama/llama-3-8b-instruct
       api_key: "os.environ/OPENROUTER_API_KEY"
   - model_name: my-hf-models
     litellm_params:
       model: huggingface/meta-llama/Llama-3.3-70B-Instruct
 router_settings:
   routing_strategy: latency-based-routing
+  num_retries: 3
+  allowed_fails: 2
+  cooldown_time: 30
+  retry_policy:
+    AuthenticationErrorRetries: 3
+    TimeoutErrorRetries: 3
+    RateLimitErrorRetries: 3
+    ContentPolicyViolationErrorRetries: 4
+    InternalServerErrorRetries: 4
+  allowed_fails_policy:
+    BadRequestErrorAllowedFails: 1000
+    AuthenticationErrorAllowedFails: 10
+    TimeoutErrorAllowedFails: 12
+    RateLimitErrorAllowedFails: 10000
+    ContentPolicyViolationErrorAllowedFails: 15
+    InternalServerErrorAllowedFails: 20
+  fallbacks:
+    - my-free-coders-new:
+        - my-free-models
+  context_window_fallbacks:
+    - my-free-coders-new:
+        - my-free-models
+  default_litellm_params:
+    max_tokens: 4096