ollama-api-lfm / README.md
oki692's picture
Upload README.md with huggingface_hub
fb87878 verified
metadata
title: Ollama Compatible API
emoji: πŸ¦™
colorFrom: green
colorTo: blue
sdk: docker
pinned: false
license: mit

Ollama Compatible API

Full Ollama-compatible API proxy for deepseek-r1:1.5b model. Works seamlessly with Open WebUI and other Ollama clients.

πŸ”— API Endpoint

https://your-space-name.hf.space

🎯 Open WebUI Configuration

Step 1: Add Connection

  1. Open Open WebUI
  2. Go to Settings β†’ Connections
  3. Click Add Connection

Step 2: Configure Ollama API

  • Type: Ollama API
  • URL: https://your-space-name.hf.space
  • API Key: Leave empty (no authentication required)

Step 3: Test Connection

Click Test Connection - should show "Connected" with available models.

πŸ“‘ Available Endpoints

GET /api/tags

List all available models (Ollama compatible)

Example:

curl https://your-space-name.hf.space/api/tags

Response:

{
  "models": [
    {
      "name": "deepseek-r1:1.5b",
      "modified_at": "2024-01-01T00:00:00Z",
      "size": 1500000000
    }
  ]
}

POST /api/generate

Generate completion (Ollama compatible)

Example:

curl -X POST https://your-space-name.hf.space/api/generate \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-r1:1.5b",
    "prompt": "Why is the sky blue?",
    "stream": true
  }'

POST /api/chat

Chat completion (Ollama compatible)

Example:

curl -X POST https://your-space-name.hf.space/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-r1:1.5b",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ],
    "stream": true
  }'

🐍 Python Client Example

import httpx
import json

API_URL = "https://your-space-name.hf.space"

# Generate completion
with httpx.stream(
    "POST",
    f"{API_URL}/api/generate",
    json={
        "model": "deepseek-r1:1.5b",
        "prompt": "What is AI?",
        "stream": True
    },
    timeout=300
) as response:
    for line in response.iter_lines():
        if line:
            data = json.loads(line)
            print(data.get("response", ""), end="", flush=True)

πŸ”§ Ollama CLI Compatible

You can use the official Ollama CLI by setting the base URL:

export OLLAMA_HOST=https://your-space-name.hf.space
ollama list
ollama run deepseek-r1:1.5b "Hello!"

⚑ Features

  • Full Ollama API compatibility - works with any Ollama client
  • Real-time streaming - low latency token-by-token generation
  • No caching - fresh responses every time
  • CORS enabled - works from browser applications
  • Open WebUI ready - plug and play integration
  • No authentication - public access (add auth if needed)

πŸš€ Performance Optimizations

  • Async I/O for non-blocking operations
  • Connection pooling with httpx
  • Flash attention enabled
  • Optimized batch processing
  • No access logs for reduced overhead

πŸ“Š Model Information

  • Model: deepseek-r1:1.5b
  • Parameters: ~1.5 billion
  • Optimized for: Fast inference, low latency
  • Context window: 2048 tokens

πŸ› οΈ Technical Stack

  • Proxy: FastAPI (Python)
  • Backend: Ollama
  • Model: deepseek-r1:1.5b
  • Server: Uvicorn ASGI

πŸ”’ Security Note

This Space has no authentication by default. If you need to restrict access:

  1. Fork this Space
  2. Add API key middleware in app.py
  3. Configure authentication in Open WebUI

πŸ“ License

MIT License


Endpoint: https://your-space-name.hf.space
Model: deepseek-r1:1.5b
Compatible with: Open WebUI, Ollama CLI, and all Ollama clients