--- title: Ollama FastAPI Streaming Server emoji: 🚀 colorFrom: blue colorTo: purple sdk: docker pinned: false license: mit --- # Ollama FastAPI Real-Time Streaming Server Fast and optimized FastAPI server with Ollama for real-time streaming inference using **deepseek-r1:1.5b** model. ## 🔑 Authentication All streaming requests require a connect key: `manus-ollama-2024` ## 📡 API Endpoints ### GET `/` Health check endpoint returning service status and endpoint URL. **Response:** ```json { "status": "online", "model": "deepseek-r1:1.5b", "endpoint": "https://your-space-url.hf.space" } ``` ### POST `/stream` Real-time streaming chat completions. **Request:** ```json { "prompt": "Explain quantum computing", "key": "manus-ollama-2024" } ``` **Response:** Server-Sent Events (SSE) stream ``` data: {"text": "Quantum", "done": false} data: {"text": " computing", "done": false} data: {"text": " is...", "done": true} ``` ### GET `/models` List available models. **Response:** ```json { "models": ["deepseek-r1:1.5b"], "default": "deepseek-r1:1.5b" } ``` ### GET `/health` Detailed health check with Ollama connection status. ## 🚀 Usage Example ### Python with httpx ```python import httpx import json url = "https://your-space-url.hf.space/stream" payload = { "prompt": "What is artificial intelligence?", "key": "manus-ollama-2024" } with httpx.stream("POST", url, json=payload, timeout=300) as response: for line in response.iter_lines(): if line.startswith("data: "): data = json.loads(line[6:]) print(data.get("text", ""), end="", flush=True) if data.get("done"): break ``` ### JavaScript/TypeScript ```javascript const response = await fetch('https://your-space-url.hf.space/stream', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ prompt: 'What is artificial intelligence?', key: 'manus-ollama-2024' }) }); const reader = response.body.getReader(); const decoder = new TextDecoder(); while (true) { const { done, value } = await reader.read(); if (done) break; const text = decoder.decode(value); const lines = text.split('\n'); for (const line of lines) { if (line.startsWith('data: ')) { const data = JSON.parse(line.slice(6)); console.log(data.text); if (data.done) break; } } } ``` ### cURL ```bash curl -X POST "https://your-space-url.hf.space/stream" \ -H "Content-Type: application/json" \ -d '{"prompt": "Hello, how are you?", "key": "manus-ollama-2024"}' \ --no-buffer ``` ## ⚡ Performance Optimizations - **Async I/O**: Full async/await architecture for non-blocking operations - **Connection pooling**: Reusable HTTP connections with httpx - **Streaming**: Real-time token streaming with minimal latency - **Model caching**: Model preloaded on startup - **Optimized parameters**: Tuned temperature, top_k, and top_p for speed ## 🔒 Security - Connect key authentication required for all streaming endpoints - CORS enabled for browser access - Input validation on all requests ## 📊 Model Information - **Model**: deepseek-r1:1.5b - **Size**: ~1.5B parameters - **Optimized for**: Fast inference and low latency - **Max tokens**: 2048 per request ## 🛠️ Development Built with: - FastAPI 0.109.0 - Ollama (latest) - Python 3.11 - Uvicorn ASGI server ## 📝 License MIT License