oki692 commited on
Commit
3154e52
·
verified ·
1 Parent(s): cd4f970

Upload 4 files

Browse files
Files changed (4) hide show
  1. Dockerfile +35 -0
  2. INSTRUKCJE.md +155 -0
  3. entrypoint.sh +41 -0
  4. proxy.py +114 -0
Dockerfile ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM ubuntu:22.04
2
+
3
+ RUN apt-get update && apt-get install -y \
4
+ python3 \
5
+ python3-pip \
6
+ curl \
7
+ ca-certificates \
8
+ zstd \
9
+ && rm -rf /var/lib/apt/lists/*
10
+
11
+ RUN curl -fsSL https://ollama.ai/install.sh | sh
12
+
13
+ RUN pip3 install --no-cache-dir \
14
+ fastapi \
15
+ uvicorn[standard] \
16
+ httpx
17
+
18
+ RUN useradd -m -u 1000 user
19
+ USER user
20
+
21
+ ENV HOME=/home/user \
22
+ PATH="/home/user/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" \
23
+ OLLAMA_HOST=127.0.0.1:11434 \
24
+ OLLAMA_NUM_PARALLEL=2 \
25
+ OLLAMA_MAX_LOADED_MODELS=1
26
+
27
+ WORKDIR /home/user/app
28
+
29
+ COPY --chown=user proxy.py .
30
+ COPY --chown=user entrypoint.sh .
31
+ RUN chmod +x entrypoint.sh
32
+
33
+ EXPOSE 7860
34
+
35
+ CMD ["./entrypoint.sh"]
INSTRUKCJE.md ADDED
@@ -0,0 +1,155 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Ollama Universal — HF Spaces Template
2
+
3
+ Uniwersalny szablon do deployowania dowolnego modelu Ollama na Hugging Face Spaces.
4
+ Wystarczy wgrać 3 pliki i ustawić 2 zmienne — kod nie wymaga zmian.
5
+
6
+ ---
7
+
8
+ ## Pliki repozytorium
9
+
10
+ ```
11
+ Dockerfile — buduje obraz (nie edytuj)
12
+ entrypoint.sh — startuje Ollama i proxy (nie edytuj)
13
+ proxy.py — API proxy z oznaczonymi miejscami do edycji
14
+ INSTRUKCJE.md — ten plik
15
+ ```
16
+
17
+ ---
18
+
19
+ ## Szybki start
20
+
21
+ ### 1. Utwórz Space na HuggingFace
22
+ - huggingface.co → Spaces → Create new Space
23
+ - SDK: **Docker**
24
+ - Hardware: **CPU Basic** (free, 16 GB RAM)
25
+ - Visibility: Public lub Private
26
+
27
+ ### 2. Wgraj pliki
28
+ Przez UI (przeciągnij i upuść) lub git:
29
+ ```bash
30
+ git clone https://huggingface.co/spaces/<username>/<space-name>
31
+ cd <space-name>
32
+ # skopiuj Dockerfile, entrypoint.sh, proxy.py
33
+ git add . && git commit -m "init" && git push
34
+ ```
35
+
36
+ ### 3. Ustaw zmienne środowiskowe
37
+ Settings → Variables and Secrets → New variable:
38
+
39
+ | Nazwa | Przykład | Opis |
40
+ |-------|---------|------|
41
+ | `MODEL` | `deepseek-r1:14b` | Model do załadowania |
42
+ | `API_KEY` | `moj-tajny-klucz` | Klucz autoryzacji Bearer |
43
+
44
+ ### 4. Poczekaj
45
+ - Build: ~2 min
46
+ - Cold start: zależy od rozmiaru modelu (np. 9 GB = ~3-5 min pobierania)
47
+
48
+ ---
49
+
50
+ ## Zmiana modelu
51
+
52
+ Zmień tylko Variable `MODEL` w Settings — Space restartuje się automatycznie.
53
+
54
+ ### Modele z Ollama registry
55
+ ```
56
+ deepseek-r1:14b 9.0 GB reasoning
57
+ deepseek-r1:7b 4.7 GB reasoning
58
+ qwen3:8b 5.2 GB reasoning
59
+ qwen3:4b 2.6 GB reasoning
60
+ qwen2.5:7b 4.7 GB
61
+ llama3.2:3b 2.0 GB
62
+ gemma3:9b 5.8 GB
63
+ mistral:7b 4.1 GB
64
+ phi4-mini:latest 4.2 GB
65
+ ```
66
+
67
+ ### Modele z HuggingFace (hf.co/...)
68
+ ```
69
+ hf.co/unsloth/GLM-4.7-Flash-GGUF:UD-TQ1_0 8.33 GB
70
+ hf.co/bartowski/Qwen2.5-7B-Instruct-GGUF:Q4_K_M 4.7 GB
71
+ hf.co/bartowski/gemma-3-9b-it-GGUF:Q4_K_M 5.8 GB
72
+ ```
73
+
74
+ ---
75
+
76
+ ## Co edytować w proxy.py
77
+
78
+ Szukaj komentarzy `!TU MUSISZ EDYTOWAC!`:
79
+
80
+ ```python
81
+ API_KEY = os.environ.get("API_KEY", "connectkey")
82
+ # ^^^^^^^^^^^ zmień domyślny klucz
83
+
84
+ MODEL = os.environ.get("MODEL", "deepseek-r1:14b")
85
+ # ^^^^^^^^^^^^^^^^ zmień domyślny model
86
+
87
+ temperature = body.get("temperature", 0.6)
88
+ # ^^^ zmień domyślną temperaturę
89
+
90
+ top_p = body.get("top_p", 0.95)
91
+ # ^^^^ zmień domyślne top_p
92
+ ```
93
+
94
+ Wartości z ENV (HF Variables) zawsze mają priorytet nad domyślnymi w kodzie.
95
+
96
+ ---
97
+
98
+ ## Użycie API
99
+
100
+ Base URL: `https://<username>-<space-name>.hf.space`
101
+
102
+ ### curl
103
+ ```bash
104
+ curl https://<space>.hf.space/v1/chat/completions \
105
+ -H "Authorization: Bearer moj-tajny-klucz" \
106
+ -H "Content-Type: application/json" \
107
+ -d '{
108
+ "messages": [{"role": "user", "content": "Cześć!"}]
109
+ }'
110
+ ```
111
+
112
+ ### Python (openai SDK)
113
+ ```python
114
+ from openai import OpenAI
115
+
116
+ client = OpenAI(
117
+ base_url="https://<space>.hf.space/v1",
118
+ api_key="moj-tajny-klucz",
119
+ )
120
+
121
+ stream = client.chat.completions.create(
122
+ model="deepseek-r1:14b",
123
+ messages=[{"role": "user", "content": "Cześć!"}],
124
+ stream=True,
125
+ )
126
+
127
+ for chunk in stream:
128
+ delta = chunk.choices[0].delta
129
+ if hasattr(delta, "reasoning_content") and delta.reasoning_content:
130
+ print(delta.reasoning_content, end="", flush=True)
131
+ if delta.content:
132
+ print(delta.content, end="", flush=True)
133
+ ```
134
+
135
+ ---
136
+
137
+ ## Endpointy
138
+
139
+ | Endpoint | Metoda | Opis |
140
+ |----------|--------|------|
141
+ | `/v1/chat/completions` | POST | Chat — zawsze streaming |
142
+ | `/v1/models` | GET | Lista załadowanych modeli |
143
+ | `/health` | GET | Status Ollamy i modelu |
144
+
145
+ ---
146
+
147
+ ## Limity CPU Basic (free)
148
+
149
+ | Parametr | Wartość |
150
+ |----------|---------|
151
+ | RAM | 16 GB |
152
+ | vCPU | 2 |
153
+ | Disk | 50 GB (reset przy restarcie) |
154
+ | Sleep po bezczynności | 48h |
155
+ | Max rozmiar modelu | ~13 GB GGUF |
entrypoint.sh ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+ set -e
3
+
4
+ # Walidacja wymaganych zmiennych
5
+ if [ -z "${MODEL}" ] || [ "${MODEL}" = "!TU MUSISZ EDYTOWAC!" ]; then
6
+ echo "BLAD: Zmienna MODEL nie jest ustawiona!"
7
+ echo "Ustaw ja w: HF Space Settings -> Variables -> MODEL"
8
+ echo "Przyklad: deepseek-r1:14b"
9
+ exit 1
10
+ fi
11
+
12
+ if [ -z "${API_KEY}" ] || [ "${API_KEY}" = "!TU MUSISZ EDYTOWAC!" ]; then
13
+ echo "BLAD: Zmienna API_KEY nie jest ustawiona!"
14
+ echo "Ustaw ja w: HF Space Settings -> Variables -> API_KEY"
15
+ exit 1
16
+ fi
17
+
18
+ export OLLAMA_HOST=127.0.0.1:11434
19
+ export OLLAMA_NUM_PARALLEL=2
20
+ export OLLAMA_MAX_LOADED_MODELS=1
21
+
22
+ echo "==> Model: ${MODEL}"
23
+ echo "==> Starting Ollama..."
24
+ ollama serve &
25
+
26
+ echo "==> Waiting for Ollama..."
27
+ for i in $(seq 1 30); do
28
+ if curl -sf http://127.0.0.1:11434/api/version > /dev/null 2>&1; then
29
+ echo "==> Ollama ready!"
30
+ break
31
+ fi
32
+ echo " Waiting... ($i/30)"
33
+ sleep 2
34
+ done
35
+
36
+ # !TU MUSISZ EDYTOWAC! — wpisz nazwę modelu, np. "deepseek-r1:14b" albo "hf.co/unsloth/GLM-4.7-Flash-GGUF:UD-TQ1_0"
37
+ echo "==> Pulling ${MODEL}..."
38
+ ollama pull ${MODEL}
39
+
40
+ echo "==> Starting proxy on :7860..."
41
+ exec uvicorn proxy:app --host 0.0.0.0 --port 7860 --workers 2 --timeout-keep-alive 300
proxy.py ADDED
@@ -0,0 +1,114 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ from fastapi import FastAPI, Request, HTTPException, Depends
3
+ from fastapi.responses import StreamingResponse
4
+ from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
5
+ import httpx
6
+ import json
7
+ import time
8
+ import uuid
9
+
10
+ app = FastAPI()
11
+ security = HTTPBearer()
12
+
13
+ API_KEY = os.environ.get("API_KEY", "!TU MUSISZ EDYTOWAC!") # np. "moj-tajny-klucz"
14
+ MODEL = os.environ.get("MODEL", "!TU MUSISZ EDYTOWAC!") # np. "deepseek-r1:14b" albo "hf.co/unsloth/GLM-4.7-Flash-GGUF:UD-TQ1_0"
15
+ OLLAMA_BASE = "http://127.0.0.1:11434"
16
+
17
+ if "!TU MUSISZ EDYTOWAC!" in (API_KEY, MODEL):
18
+ raise RuntimeError("Ustaw zmienne API_KEY i MODEL w HF Space Settings -> Variables")
19
+
20
+
21
+ def verify_key(credentials: HTTPAuthorizationCredentials = Depends(security)):
22
+ if credentials.credentials != API_KEY:
23
+ raise HTTPException(status_code=401, detail="Invalid API key")
24
+ return credentials.credentials
25
+
26
+
27
+ @app.get("/v1/models")
28
+ async def list_models(key: str = Depends(verify_key)):
29
+ return {
30
+ "object": "list",
31
+ "data": [{
32
+ "id": MODEL,
33
+ "object": "model",
34
+ "created": int(time.time()),
35
+ "owned_by": "ollama",
36
+ }]
37
+ }
38
+
39
+
40
+ @app.post("/v1/chat/completions")
41
+ async def chat_completions(request: Request, key: str = Depends(verify_key)):
42
+ body = await request.json()
43
+
44
+ messages = body.get("messages", [])
45
+ temperature = body.get("temperature", 0.6) # !TU MUSISZ EDYTOWAC! domyślna temperatura (0.0-2.0)
46
+ top_p = body.get("top_p", 0.95) # !TU MUSISZ EDYTOWAC! domyślne top_p (0.0-1.0)
47
+
48
+ options = {"temperature": temperature, "top_p": top_p}
49
+ if "max_tokens" in body:
50
+ options["num_predict"] = body["max_tokens"]
51
+
52
+ ollama_payload = {
53
+ "model": MODEL,
54
+ "messages": messages,
55
+ "stream": True,
56
+ "options": options,
57
+ }
58
+
59
+ completion_id = f"chatcmpl-{uuid.uuid4().hex}"
60
+ created = int(time.time())
61
+
62
+ async def generate():
63
+ async with httpx.AsyncClient(timeout=300.0) as client:
64
+ async with client.stream("POST", f"{OLLAMA_BASE}/api/chat", json=ollama_payload) as resp:
65
+ async for line in resp.aiter_lines():
66
+ if not line:
67
+ continue
68
+ try:
69
+ chunk = json.loads(line)
70
+ except Exception:
71
+ continue
72
+
73
+ msg = chunk.get("message", {})
74
+ done = chunk.get("done", False)
75
+
76
+ if done:
77
+ delta = {}
78
+ else:
79
+ delta = {}
80
+ if msg.get("thinking") is not None:
81
+ delta["reasoning_content"] = msg["thinking"]
82
+ if msg.get("content") is not None:
83
+ delta["content"] = msg["content"]
84
+
85
+ data = {
86
+ "id": completion_id,
87
+ "object": "chat.completion.chunk",
88
+ "created": created,
89
+ "model": MODEL,
90
+ "choices": [{
91
+ "index": 0,
92
+ "delta": delta,
93
+ "finish_reason": "stop" if done else None,
94
+ }]
95
+ }
96
+ yield f"data: {json.dumps(data)}\n\n"
97
+
98
+ if done:
99
+ break
100
+
101
+ yield "data: [DONE]\n\n"
102
+
103
+ return StreamingResponse(generate(), media_type="text/event-stream")
104
+
105
+
106
+ @app.get("/health")
107
+ async def health():
108
+ async with httpx.AsyncClient(timeout=5.0) as client:
109
+ try:
110
+ r = await client.get(f"{OLLAMA_BASE}/api/version")
111
+ ollama_ok = r.status_code == 200
112
+ except Exception:
113
+ ollama_ok = False
114
+ return {"status": "ok" if ollama_ok else "starting", "model": MODEL}