Spaces:

oki692
/

ollama-api-lfm

Sleeping

App Files Files Community

ollama-api-lfm / README.md

oki692

Upload README.md with huggingface_hub

fb87878 verified 20 days ago

preview code

raw

history blame contribute delete

3.66 kB

	---
	title: Ollama Compatible API
	emoji: 🦙
	colorFrom: green
	colorTo: blue
	sdk: docker
	pinned: false
	license: mit
	---

	# Ollama Compatible API

	Full Ollama-compatible API proxy for deepseek-r1:1.5b model. Works seamlessly with Open WebUI and other Ollama clients.

	## 🔗 API Endpoint

	```
	https://your-space-name.hf.space
	```

	## 🎯 Open WebUI Configuration

	### Step 1: Add Connection
	1. Open Open WebUI
	2. Go to Settings → Connections
	3. Click Add Connection

	### Step 2: Configure Ollama API
	- Type: Ollama API
	- URL: `https://your-space-name.hf.space`
	- API Key: Leave empty (no authentication required)

	### Step 3: Test Connection
	Click Test Connection - should show "Connected" with available models.

	## 📡 Available Endpoints

	### GET `/api/tags`
	List all available models (Ollama compatible)

	Example:
	```bash
	curl https://your-space-name.hf.space/api/tags
	```

	Response:
	```json
	{
	"models": [
	{
	"name": "deepseek-r1:1.5b",
	"modified_at": "2024-01-01T00:00:00Z",
	"size": 1500000000
	}
	]
	}
	```

	### POST `/api/generate`
	Generate completion (Ollama compatible)

	Example:
	```bash
	curl -X POST https://your-space-name.hf.space/api/generate \
	-H "Content-Type: application/json" \
	-d '{
	"model": "deepseek-r1:1.5b",
	"prompt": "Why is the sky blue?",
	"stream": true
	}'
	```

	### POST `/api/chat`
	Chat completion (Ollama compatible)

	Example:
	```bash
	curl -X POST https://your-space-name.hf.space/api/chat \
	-H "Content-Type: application/json" \
	-d '{
	"model": "deepseek-r1:1.5b",
	"messages": [
	{"role": "user", "content": "Hello!"}
	],
	"stream": true
	}'
	```

	## 🐍 Python Client Example

	```python
	import httpx
	import json

	API_URL = "https://your-space-name.hf.space"

	# Generate completion
	with httpx.stream(
	"POST",
	f"{API_URL}/api/generate",
	json={
	"model": "deepseek-r1:1.5b",
	"prompt": "What is AI?",
	"stream": True
	},
	timeout=300
	) as response:
	for line in response.iter_lines():
	if line:
	data = json.loads(line)
	print(data.get("response", ""), end="", flush=True)
	```

	## 🔧 Ollama CLI Compatible

	You can use the official Ollama CLI by setting the base URL:

	```bash
	export OLLAMA_HOST=https://your-space-name.hf.space
	ollama list
	ollama run deepseek-r1:1.5b "Hello!"
	```

	## ⚡ Features

	- Full Ollama API compatibility - works with any Ollama client
	- Real-time streaming - low latency token-by-token generation
	- No caching - fresh responses every time
	- CORS enabled - works from browser applications
	- Open WebUI ready - plug and play integration
	- No authentication - public access (add auth if needed)

	## 🚀 Performance Optimizations

	- Async I/O for non-blocking operations
	- Connection pooling with httpx
	- Flash attention enabled
	- Optimized batch processing
	- No access logs for reduced overhead

	## 📊 Model Information

	- Model: deepseek-r1:1.5b
	- Parameters: ~1.5 billion
	- Optimized for: Fast inference, low latency
	- Context window: 2048 tokens

	## 🛠️ Technical Stack

	- Proxy: FastAPI (Python)
	- Backend: Ollama
	- Model: deepseek-r1:1.5b
	- Server: Uvicorn ASGI

	## 🔒 Security Note

	This Space has no authentication by default. If you need to restrict access:
	1. Fork this Space
	2. Add API key middleware in `app.py`
	3. Configure authentication in Open WebUI

	## 📝 License

	MIT License

	---

	Endpoint: https://your-space-name.hf.space
	Model: deepseek-r1:1.5b
	Compatible with: Open WebUI, Ollama CLI, and all Ollama clients