Spaces:
Sleeping
Sleeping
| title: Ollama FastAPI Streaming Server | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: docker | |
| pinned: false | |
| license: mit | |
| # Ollama FastAPI Real-Time Streaming Server | |
| Fast and optimized FastAPI server with Ollama for real-time streaming inference using **deepseek-r1:1.5b** model. | |
| ## π Authentication | |
| All streaming requests require a connect key: `manus-ollama-2024` | |
| ## π‘ API Endpoints | |
| ### GET `/` | |
| Health check endpoint returning service status and endpoint URL. | |
| **Response:** | |
| ```json | |
| { | |
| "status": "online", | |
| "model": "deepseek-r1:1.5b", | |
| "endpoint": "https://your-space-url.hf.space" | |
| } | |
| ``` | |
| ### POST `/stream` | |
| Real-time streaming chat completions. | |
| **Request:** | |
| ```json | |
| { | |
| "prompt": "Explain quantum computing", | |
| "key": "manus-ollama-2024" | |
| } | |
| ``` | |
| **Response:** Server-Sent Events (SSE) stream | |
| ``` | |
| data: {"text": "Quantum", "done": false} | |
| data: {"text": " computing", "done": false} | |
| data: {"text": " is...", "done": true} | |
| ``` | |
| ### GET `/models` | |
| List available models. | |
| **Response:** | |
| ```json | |
| { | |
| "models": ["deepseek-r1:1.5b"], | |
| "default": "deepseek-r1:1.5b" | |
| } | |
| ``` | |
| ### GET `/health` | |
| Detailed health check with Ollama connection status. | |
| ## π Usage Example | |
| ### Python with httpx | |
| ```python | |
| import httpx | |
| import json | |
| url = "https://your-space-url.hf.space/stream" | |
| payload = { | |
| "prompt": "What is artificial intelligence?", | |
| "key": "manus-ollama-2024" | |
| } | |
| with httpx.stream("POST", url, json=payload, timeout=300) as response: | |
| for line in response.iter_lines(): | |
| if line.startswith("data: "): | |
| data = json.loads(line[6:]) | |
| print(data.get("text", ""), end="", flush=True) | |
| if data.get("done"): | |
| break | |
| ``` | |
| ### JavaScript/TypeScript | |
| ```javascript | |
| const response = await fetch('https://your-space-url.hf.space/stream', { | |
| method: 'POST', | |
| headers: { 'Content-Type': 'application/json' }, | |
| body: JSON.stringify({ | |
| prompt: 'What is artificial intelligence?', | |
| key: 'manus-ollama-2024' | |
| }) | |
| }); | |
| const reader = response.body.getReader(); | |
| const decoder = new TextDecoder(); | |
| while (true) { | |
| const { done, value } = await reader.read(); | |
| if (done) break; | |
| const text = decoder.decode(value); | |
| const lines = text.split('\n'); | |
| for (const line of lines) { | |
| if (line.startsWith('data: ')) { | |
| const data = JSON.parse(line.slice(6)); | |
| console.log(data.text); | |
| if (data.done) break; | |
| } | |
| } | |
| } | |
| ``` | |
| ### cURL | |
| ```bash | |
| curl -X POST "https://your-space-url.hf.space/stream" \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"prompt": "Hello, how are you?", "key": "manus-ollama-2024"}' \ | |
| --no-buffer | |
| ``` | |
| ## β‘ Performance Optimizations | |
| - **Async I/O**: Full async/await architecture for non-blocking operations | |
| - **Connection pooling**: Reusable HTTP connections with httpx | |
| - **Streaming**: Real-time token streaming with minimal latency | |
| - **Model caching**: Model preloaded on startup | |
| - **Optimized parameters**: Tuned temperature, top_k, and top_p for speed | |
| ## π Security | |
| - Connect key authentication required for all streaming endpoints | |
| - CORS enabled for browser access | |
| - Input validation on all requests | |
| ## π Model Information | |
| - **Model**: deepseek-r1:1.5b | |
| - **Size**: ~1.5B parameters | |
| - **Optimized for**: Fast inference and low latency | |
| - **Max tokens**: 2048 per request | |
| ## π οΈ Development | |
| Built with: | |
| - FastAPI 0.109.0 | |
| - Ollama (latest) | |
| - Python 3.11 | |
| - Uvicorn ASGI server | |
| ## π License | |
| MIT License | |