ollama

Runtime error

cometapii commited on 24 days ago

Commit

1f2477d

verified ·

1 Parent(s): 205db80

Upload 3 files

Files changed (3) hide show

Dockerfile (2) ADDED Viewed

+FROM ubuntu:22.04
+# Deps
+RUN apt-get update && apt-get install -y \
+    curl \
+    ca-certificates \
+    && rm -rf /var/lib/apt/lists/*
+# Install Ollama
+RUN curl -fsSL https://ollama.ai/install.sh | sh
+# Create non-root user (HF Spaces requires UID 1000)
+RUN useradd -m -u 1000 user
+USER user
+ENV HOME=/home/user \
+    PATH="/home/user/.local/bin:$PATH" \
+    OLLAMA_HOST=0.0.0.0:7860 \
+    OLLAMA_MODELS=/home/user/.ollama/models
+WORKDIR $HOME/app
+COPY --chown=user entrypoint.sh .
+RUN chmod +x entrypoint.sh
+# Pre-pull model at build time so first request is instant
+# HF Spaces build layer caches this
+USER root
+RUN ollama serve & \
+    sleep 5 && \
+    ollama pull granite4:350m && \
+    pkill ollama || true
+USER user
+EXPOSE 7860
+CMD ["./entrypoint.sh"]

README (2).md ADDED Viewed

+---
+title: Ollama Granite4 350m
+emoji: 🪨
+colorFrom: gray
+colorTo: blue
+sdk: docker
+pinned: false
+app_port: 7860
+---
+# Ollama — IBM Granite 4.0 350m
+Serwer Ollama z modelem **IBM Granite 4.0 (350m)** udostępniający REST API kompatybilne z Ollama.
+## Endpoints
+| Method | Path | Opis |
+|--------|------|------|
+| `GET` | `/api/version` | Wersja Ollama |
+| `GET` | `/api/tags` | Lista dostępnych modeli |
+| `POST` | `/api/generate` | Generowanie tekstu (streaming) |
+| `POST` | `/api/chat` | Chat completions |
+| `POST` | `/api/embeddings` | Embeddingi |
+## Przykład użycia
+```bash
+# Generate
+curl https://<your-space-url>/api/generate \
+  -d '{"model":"granite4:350m","prompt":"Hello!","stream":false}'
+# Chat
+curl https://<your-space-url>/api/chat \
+  -d '{
+    "model": "granite4:350m",
+    "messages": [{"role":"user","content":"Explain quantum computing briefly."}],
+    "stream": false
+  }'
+```
+## Model
+- **Model:** IBM Granite 4.0 — 350M params
+- **Architektura:** Transformer (nie hybrydowy Mamba-2)
+- **Tag Ollama:** `granite4:350m`
+- **Kwantyzacja:** Q4_K_M (domyślna)
+- **Rozmiar:** ~250 MB
+- **Zastosowanie:** instrukcje, Q&A, RAG, klasyfikacja, code

entrypoint.sh ADDED Viewed

+#!/bin/bash
+set -e
+echo "==> Starting Ollama server on port 7860..."
+export OLLAMA_HOST=0.0.0.0:7860
+export OLLAMA_MODELS=/home/user/.ollama/models
+# Start ollama in foreground
+ollama serve &
+OLLAMA_PID=$!
+# Wait for ollama to be ready
+echo "==> Waiting for Ollama to be ready..."
+MAX_RETRIES=30
+COUNT=0
+until curl -s http://localhost:7860/api/version > /dev/null 2>&1; do
+    COUNT=$((COUNT + 1))
+    if [ $COUNT -ge $MAX_RETRIES ]; then
+        echo "ERROR: Ollama did not start in time."
+        exit 1
+    fi
+    echo "  ... attempt $COUNT/$MAX_RETRIES"
+    sleep 2
+done
+echo "==> Ollama is ready!"
+# Pull model if not cached (fallback in case build layer failed)
+if ! ollama list | grep -q "granite4"; then
+    echo "==> Pulling granite4:350m..."
+    ollama pull granite4:350m
+fi
+echo "==> Model available:"
+ollama list
+echo "==> Ollama API running at http://0.0.0.0:7860"
+echo "==> Endpoints:"
+echo "    POST /api/generate"
+echo "    POST /api/chat"
+echo "    GET  /api/tags"
+echo "    POST /api/embeddings"
+# Keep process alive
+wait $OLLAMA_PID