ollamaapi

Runtime error

App Files Files Community

oki692 commited on Feb 14

Commit

3154e52

verified ·

1 Parent(s): cd4f970

Upload 4 files

Browse files

Files changed (4) hide show

Dockerfile +35 -0
INSTRUKCJE.md +155 -0
entrypoint.sh +41 -0
proxy.py +114 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,35 @@

+FROM ubuntu:22.04
+RUN apt-get update && apt-get install -y \
+    python3 \
+    python3-pip \
+    curl \
+    ca-certificates \
+    zstd \
+    && rm -rf /var/lib/apt/lists/*
+RUN curl -fsSL https://ollama.ai/install.sh | sh
+RUN pip3 install --no-cache-dir \
+    fastapi \
+    uvicorn[standard] \
+    httpx
+RUN useradd -m -u 1000 user
+USER user
+ENV HOME=/home/user \
+    PATH="/home/user/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" \
+    OLLAMA_HOST=127.0.0.1:11434 \
+    OLLAMA_NUM_PARALLEL=2 \
+    OLLAMA_MAX_LOADED_MODELS=1
+WORKDIR /home/user/app
+COPY --chown=user proxy.py .
+COPY --chown=user entrypoint.sh .
+RUN chmod +x entrypoint.sh
+EXPOSE 7860
+CMD ["./entrypoint.sh"]

INSTRUKCJE.md ADDED Viewed

	@@ -0,0 +1,155 @@

+# Ollama Universal — HF Spaces Template
+Uniwersalny szablon do deployowania dowolnego modelu Ollama na Hugging Face Spaces.
+Wystarczy wgrać 3 pliki i ustawić 2 zmienne — kod nie wymaga zmian.
+---
+## Pliki repozytorium
+```
+Dockerfile       — buduje obraz (nie edytuj)
+entrypoint.sh    — startuje Ollama i proxy (nie edytuj)
+proxy.py         — API proxy z oznaczonymi miejscami do edycji
+INSTRUKCJE.md    — ten plik
+```
+---
+## Szybki start
+### 1. Utwórz Space na HuggingFace
+- huggingface.co → Spaces → Create new Space
+- SDK: **Docker**
+- Hardware: **CPU Basic** (free, 16 GB RAM)
+- Visibility: Public lub Private
+### 2. Wgraj pliki
+Przez UI (przeciągnij i upuść) lub git:
+```bash
+git clone https://huggingface.co/spaces/<username>/<space-name>
+cd <space-name>
+# skopiuj Dockerfile, entrypoint.sh, proxy.py
+git add . && git commit -m "init" && git push
+```
+### 3. Ustaw zmienne środowiskowe
+Settings → Variables and Secrets → New variable:
+| Nazwa | Przykład | Opis |
+|-------|---------|------|
+| `MODEL` | `deepseek-r1:14b` | Model do załadowania |
+| `API_KEY` | `moj-tajny-klucz` | Klucz autoryzacji Bearer |
+### 4. Poczekaj
+- Build: ~2 min
+- Cold start: zależy od rozmiaru modelu (np. 9 GB = ~3-5 min pobierania)
+---
+## Zmiana modelu
+Zmień tylko Variable `MODEL` w Settings — Space restartuje się automatycznie.
+### Modele z Ollama registry
+```
+deepseek-r1:14b          9.0 GB   reasoning
+deepseek-r1:7b           4.7 GB   reasoning
+qwen3:8b                 5.2 GB   reasoning
+qwen3:4b                 2.6 GB   reasoning
+qwen2.5:7b               4.7 GB
+llama3.2:3b              2.0 GB
+gemma3:9b                5.8 GB
+mistral:7b               4.1 GB
+phi4-mini:latest         4.2 GB
+```
+### Modele z HuggingFace (hf.co/...)
+```
+hf.co/unsloth/GLM-4.7-Flash-GGUF:UD-TQ1_0        8.33 GB
+hf.co/bartowski/Qwen2.5-7B-Instruct-GGUF:Q4_K_M  4.7  GB
+hf.co/bartowski/gemma-3-9b-it-GGUF:Q4_K_M        5.8  GB
+```
+---
+## Co edytować w proxy.py
+Szukaj komentarzy `!TU MUSISZ EDYTOWAC!`:
+```python
+API_KEY = os.environ.get("API_KEY", "connectkey")
+#                                    ^^^^^^^^^^^ zmień domyślny klucz
+MODEL = os.environ.get("MODEL", "deepseek-r1:14b")
+#                                ^^^^^^^^^^^^^^^^ zmień domyślny model
+temperature = body.get("temperature", 0.6)
+#                                     ^^^ zmień domyślną temperaturę
+top_p = body.get("top_p", 0.95)
+#                          ^^^^ zmień domyślne top_p
+```
+Wartości z ENV (HF Variables) zawsze mają priorytet nad domyślnymi w kodzie.
+---
+## Użycie API
+Base URL: `https://<username>-<space-name>.hf.space`
+### curl
+```bash
+curl https://<space>.hf.space/v1/chat/completions \
+  -H "Authorization: Bearer moj-tajny-klucz" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "messages": [{"role": "user", "content": "Cześć!"}]
+  }'
+```
+### Python (openai SDK)
+```python
+from openai import OpenAI
+client = OpenAI(
+    base_url="https://<space>.hf.space/v1",
+    api_key="moj-tajny-klucz",
+)
+stream = client.chat.completions.create(
+    model="deepseek-r1:14b",
+    messages=[{"role": "user", "content": "Cześć!"}],
+    stream=True,
+)
+for chunk in stream:
+    delta = chunk.choices[0].delta
+    if hasattr(delta, "reasoning_content") and delta.reasoning_content:
+        print(delta.reasoning_content, end="", flush=True)
+    if delta.content:
+        print(delta.content, end="", flush=True)
+```
+---
+## Endpointy
+| Endpoint | Metoda | Opis |
+|----------|--------|------|
+| `/v1/chat/completions` | POST | Chat — zawsze streaming |
+| `/v1/models` | GET | Lista załadowanych modeli |
+| `/health` | GET | Status Ollamy i modelu |
+---
+## Limity CPU Basic (free)
+| Parametr | Wartość |
+|----------|---------|
+| RAM | 16 GB |
+| vCPU | 2 |
+| Disk | 50 GB (reset przy restarcie) |
+| Sleep po bezczynności | 48h |
+| Max rozmiar modelu | ~13 GB GGUF |

entrypoint.sh ADDED Viewed

	@@ -0,0 +1,41 @@

+#!/bin/bash
+set -e
+# Walidacja wymaganych zmiennych
+if [ -z "${MODEL}" ] || [ "${MODEL}" = "!TU MUSISZ EDYTOWAC!" ]; then
+    echo "BLAD: Zmienna MODEL nie jest ustawiona!"
+    echo "Ustaw ja w: HF Space Settings -> Variables -> MODEL"
+    echo "Przyklad: deepseek-r1:14b"
+    exit 1
+fi
+if [ -z "${API_KEY}" ] || [ "${API_KEY}" = "!TU MUSISZ EDYTOWAC!" ]; then
+    echo "BLAD: Zmienna API_KEY nie jest ustawiona!"
+    echo "Ustaw ja w: HF Space Settings -> Variables -> API_KEY"
+    exit 1
+fi
+export OLLAMA_HOST=127.0.0.1:11434
+export OLLAMA_NUM_PARALLEL=2
+export OLLAMA_MAX_LOADED_MODELS=1
+echo "==> Model: ${MODEL}"
+echo "==> Starting Ollama..."
+ollama serve &
+echo "==> Waiting for Ollama..."
+for i in $(seq 1 30); do
+    if curl -sf http://127.0.0.1:11434/api/version > /dev/null 2>&1; then
+        echo "==> Ollama ready!"
+        break
+    fi
+    echo "    Waiting... ($i/30)"
+    sleep 2
+done
+# !TU MUSISZ EDYTOWAC! — wpisz nazwę modelu, np. "deepseek-r1:14b" albo "hf.co/unsloth/GLM-4.7-Flash-GGUF:UD-TQ1_0"
+echo "==> Pulling ${MODEL}..."
+ollama pull ${MODEL}
+echo "==> Starting proxy on :7860..."
+exec uvicorn proxy:app --host 0.0.0.0 --port 7860 --workers 2 --timeout-keep-alive 300

proxy.py ADDED Viewed

	@@ -0,0 +1,114 @@

+import os
+from fastapi import FastAPI, Request, HTTPException, Depends
+from fastapi.responses import StreamingResponse
+from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
+import httpx
+import json
+import time
+import uuid
+app = FastAPI()
+security = HTTPBearer()
+API_KEY = os.environ.get("API_KEY", "!TU MUSISZ EDYTOWAC!")  # np. "moj-tajny-klucz"
+MODEL   = os.environ.get("MODEL",   "!TU MUSISZ EDYTOWAC!")  # np. "deepseek-r1:14b" albo "hf.co/unsloth/GLM-4.7-Flash-GGUF:UD-TQ1_0"
+OLLAMA_BASE = "http://127.0.0.1:11434"
+if "!TU MUSISZ EDYTOWAC!" in (API_KEY, MODEL):
+    raise RuntimeError("Ustaw zmienne API_KEY i MODEL w HF Space Settings -> Variables")
+def verify_key(credentials: HTTPAuthorizationCredentials = Depends(security)):
+    if credentials.credentials != API_KEY:
+        raise HTTPException(status_code=401, detail="Invalid API key")
+    return credentials.credentials
+@app.get("/v1/models")
+async def list_models(key: str = Depends(verify_key)):
+    return {
+        "object": "list",
+        "data": [{
+            "id": MODEL,
+            "object": "model",
+            "created": int(time.time()),
+            "owned_by": "ollama",
+        }]
+    }
+@app.post("/v1/chat/completions")
+async def chat_completions(request: Request, key: str = Depends(verify_key)):
+    body = await request.json()
+    messages    = body.get("messages", [])
+    temperature = body.get("temperature", 0.6)  # !TU MUSISZ EDYTOWAC! domyślna temperatura (0.0-2.0)
+    top_p       = body.get("top_p", 0.95)       # !TU MUSISZ EDYTOWAC! domyślne top_p (0.0-1.0)
+    options = {"temperature": temperature, "top_p": top_p}
+    if "max_tokens" in body:
+        options["num_predict"] = body["max_tokens"]
+    ollama_payload = {
+        "model": MODEL,
+        "messages": messages,
+        "stream": True,
+        "options": options,
+    }
+    completion_id = f"chatcmpl-{uuid.uuid4().hex}"
+    created       = int(time.time())
+    async def generate():
+        async with httpx.AsyncClient(timeout=300.0) as client:
+            async with client.stream("POST", f"{OLLAMA_BASE}/api/chat", json=ollama_payload) as resp:
+                async for line in resp.aiter_lines():
+                    if not line:
+                        continue
+                    try:
+                        chunk = json.loads(line)
+                    except Exception:
+                        continue
+                    msg  = chunk.get("message", {})
+                    done = chunk.get("done", False)
+                    if done:
+                        delta = {}
+                    else:
+                        delta = {}
+                        if msg.get("thinking") is not None:
+                            delta["reasoning_content"] = msg["thinking"]
+                        if msg.get("content") is not None:
+                            delta["content"] = msg["content"]
+                    data = {
+                        "id":      completion_id,
+                        "object":  "chat.completion.chunk",
+                        "created": created,
+                        "model":   MODEL,
+                        "choices": [{
+                            "index":         0,
+                            "delta":         delta,
+                            "finish_reason": "stop" if done else None,
+                        }]
+                    }
+                    yield f"data: {json.dumps(data)}\n\n"
+                    if done:
+                        break
+        yield "data: [DONE]\n\n"
+    return StreamingResponse(generate(), media_type="text/event-stream")
+@app.get("/health")
+async def health():
+    async with httpx.AsyncClient(timeout=5.0) as client:
+        try:
+            r = await client.get(f"{OLLAMA_BASE}/api/version")
+            ollama_ok = r.status_code == 200
+        except Exception:
+            ollama_ok = False
+    return {"status": "ok" if ollama_ok else "starting", "model": MODEL}