Spaces:
Sleeping
Sleeping
metadata
title: Ollama Compatible API
emoji: π¦
colorFrom: green
colorTo: blue
sdk: docker
pinned: false
license: mit
Ollama Compatible API
Full Ollama-compatible API proxy for deepseek-r1:1.5b model. Works seamlessly with Open WebUI and other Ollama clients.
π API Endpoint
https://your-space-name.hf.space
π― Open WebUI Configuration
Step 1: Add Connection
- Open Open WebUI
- Go to Settings β Connections
- Click Add Connection
Step 2: Configure Ollama API
- Type: Ollama API
- URL:
https://your-space-name.hf.space - API Key: Leave empty (no authentication required)
Step 3: Test Connection
Click Test Connection - should show "Connected" with available models.
π‘ Available Endpoints
GET /api/tags
List all available models (Ollama compatible)
Example:
curl https://your-space-name.hf.space/api/tags
Response:
{
"models": [
{
"name": "deepseek-r1:1.5b",
"modified_at": "2024-01-01T00:00:00Z",
"size": 1500000000
}
]
}
POST /api/generate
Generate completion (Ollama compatible)
Example:
curl -X POST https://your-space-name.hf.space/api/generate \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-r1:1.5b",
"prompt": "Why is the sky blue?",
"stream": true
}'
POST /api/chat
Chat completion (Ollama compatible)
Example:
curl -X POST https://your-space-name.hf.space/api/chat \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-r1:1.5b",
"messages": [
{"role": "user", "content": "Hello!"}
],
"stream": true
}'
π Python Client Example
import httpx
import json
API_URL = "https://your-space-name.hf.space"
# Generate completion
with httpx.stream(
"POST",
f"{API_URL}/api/generate",
json={
"model": "deepseek-r1:1.5b",
"prompt": "What is AI?",
"stream": True
},
timeout=300
) as response:
for line in response.iter_lines():
if line:
data = json.loads(line)
print(data.get("response", ""), end="", flush=True)
π§ Ollama CLI Compatible
You can use the official Ollama CLI by setting the base URL:
export OLLAMA_HOST=https://your-space-name.hf.space
ollama list
ollama run deepseek-r1:1.5b "Hello!"
β‘ Features
- Full Ollama API compatibility - works with any Ollama client
- Real-time streaming - low latency token-by-token generation
- No caching - fresh responses every time
- CORS enabled - works from browser applications
- Open WebUI ready - plug and play integration
- No authentication - public access (add auth if needed)
π Performance Optimizations
- Async I/O for non-blocking operations
- Connection pooling with httpx
- Flash attention enabled
- Optimized batch processing
- No access logs for reduced overhead
π Model Information
- Model: deepseek-r1:1.5b
- Parameters: ~1.5 billion
- Optimized for: Fast inference, low latency
- Context window: 2048 tokens
π οΈ Technical Stack
- Proxy: FastAPI (Python)
- Backend: Ollama
- Model: deepseek-r1:1.5b
- Server: Uvicorn ASGI
π Security Note
This Space has no authentication by default. If you need to restrict access:
- Fork this Space
- Add API key middleware in
app.py - Configure authentication in Open WebUI
π License
MIT License
Endpoint: https://your-space-name.hf.space
Model: deepseek-r1:1.5b
Compatible with: Open WebUI, Ollama CLI, and all Ollama clients