Spaces:

oki692
/

ollama-fastapi-streaming

Sleeping

App Files Files Community

oki692 commited on 17 days ago

Commit

0e8027f

verified ·

1 Parent(s): 350da8b

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +152 -5

README.md CHANGED Viewed

@@ -1,10 +1,157 @@
 ---
-title: Ollama Fastapi Streaming
-emoji: 🌖
-colorFrom: pink
-colorTo: green
 sdk: docker
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Ollama FastAPI Streaming Server
+emoji: 🚀
+colorFrom: blue
+colorTo: purple
 sdk: docker
 pinned: false
+license: mit
 ---
+# Ollama FastAPI Real-Time Streaming Server
+Fast and optimized FastAPI server with Ollama for real-time streaming inference using **deepseek-r1:1.5b** model.
+## 🔑 Authentication
+All streaming requests require a connect key: `manus-ollama-2024`
+## 📡 API Endpoints
+### GET `/`
+Health check endpoint returning service status and endpoint URL.
+**Response:**
+```json
+{
+  "status": "online",
+  "model": "deepseek-r1:1.5b",
+  "endpoint": "https://your-space-url.hf.space"
+}
+```
+### POST `/stream`
+Real-time streaming chat completions.
+**Request:**
+```json
+{
+  "prompt": "Explain quantum computing",
+  "key": "manus-ollama-2024"
+}
+```
+**Response:** Server-Sent Events (SSE) stream
+```
+data: {"text": "Quantum", "done": false}
+data: {"text": " computing", "done": false}
+data: {"text": " is...", "done": true}
+```
+### GET `/models`
+List available models.
+**Response:**
+```json
+{
+  "models": ["deepseek-r1:1.5b"],
+  "default": "deepseek-r1:1.5b"
+}
+```
+### GET `/health`
+Detailed health check with Ollama connection status.
+## 🚀 Usage Example
+### Python with httpx
+```python
+import httpx
+import json
+url = "https://your-space-url.hf.space/stream"
+payload = {
+    "prompt": "What is artificial intelligence?",
+    "key": "manus-ollama-2024"
+}
+with httpx.stream("POST", url, json=payload, timeout=300) as response:
+    for line in response.iter_lines():
+        if line.startswith("data: "):
+            data = json.loads(line[6:])
+            print(data.get("text", ""), end="", flush=True)
+            if data.get("done"):
+                break
+```
+### JavaScript/TypeScript
+```javascript
+const response = await fetch('https://your-space-url.hf.space/stream', {
+  method: 'POST',
+  headers: { 'Content-Type': 'application/json' },
+  body: JSON.stringify({
+    prompt: 'What is artificial intelligence?',
+    key: 'manus-ollama-2024'
+  })
+});
+const reader = response.body.getReader();
+const decoder = new TextDecoder();
+while (true) {
+  const { done, value } = await reader.read();
+  if (done) break;
+  const text = decoder.decode(value);
+  const lines = text.split('\n');
+  for (const line of lines) {
+    if (line.startsWith('data: ')) {
+      const data = JSON.parse(line.slice(6));
+      console.log(data.text);
+      if (data.done) break;
+    }
+  }
+}
+```
+### cURL
+```bash
+curl -X POST "https://your-space-url.hf.space/stream" \
+  -H "Content-Type: application/json" \
+  -d '{"prompt": "Hello, how are you?", "key": "manus-ollama-2024"}' \
+  --no-buffer
+```
+## ⚡ Performance Optimizations
+- **Async I/O**: Full async/await architecture for non-blocking operations
+- **Connection pooling**: Reusable HTTP connections with httpx
+- **Streaming**: Real-time token streaming with minimal latency
+- **Model caching**: Model preloaded on startup
+- **Optimized parameters**: Tuned temperature, top_k, and top_p for speed
+## 🔒 Security
+- Connect key authentication required for all streaming endpoints
+- CORS enabled for browser access
+- Input validation on all requests
+## 📊 Model Information
+- **Model**: deepseek-r1:1.5b
+- **Size**: ~1.5B parameters
+- **Optimized for**: Fast inference and low latency
+- **Max tokens**: 2048 per request
+## 🛠️ Development
+Built with:
+- FastAPI 0.109.0
+- Ollama (latest)
+- Python 3.11
+- Uvicorn ASGI server
+## 📝 License
+MIT License