oki692's picture
Upload README.md with huggingface_hub
0e8027f verified
---
title: Ollama FastAPI Streaming Server
emoji: πŸš€
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: mit
---
# Ollama FastAPI Real-Time Streaming Server
Fast and optimized FastAPI server with Ollama for real-time streaming inference using **deepseek-r1:1.5b** model.
## πŸ”‘ Authentication
All streaming requests require a connect key: `manus-ollama-2024`
## πŸ“‘ API Endpoints
### GET `/`
Health check endpoint returning service status and endpoint URL.
**Response:**
```json
{
"status": "online",
"model": "deepseek-r1:1.5b",
"endpoint": "https://your-space-url.hf.space"
}
```
### POST `/stream`
Real-time streaming chat completions.
**Request:**
```json
{
"prompt": "Explain quantum computing",
"key": "manus-ollama-2024"
}
```
**Response:** Server-Sent Events (SSE) stream
```
data: {"text": "Quantum", "done": false}
data: {"text": " computing", "done": false}
data: {"text": " is...", "done": true}
```
### GET `/models`
List available models.
**Response:**
```json
{
"models": ["deepseek-r1:1.5b"],
"default": "deepseek-r1:1.5b"
}
```
### GET `/health`
Detailed health check with Ollama connection status.
## πŸš€ Usage Example
### Python with httpx
```python
import httpx
import json
url = "https://your-space-url.hf.space/stream"
payload = {
"prompt": "What is artificial intelligence?",
"key": "manus-ollama-2024"
}
with httpx.stream("POST", url, json=payload, timeout=300) as response:
for line in response.iter_lines():
if line.startswith("data: "):
data = json.loads(line[6:])
print(data.get("text", ""), end="", flush=True)
if data.get("done"):
break
```
### JavaScript/TypeScript
```javascript
const response = await fetch('https://your-space-url.hf.space/stream', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
prompt: 'What is artificial intelligence?',
key: 'manus-ollama-2024'
})
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const text = decoder.decode(value);
const lines = text.split('\n');
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = JSON.parse(line.slice(6));
console.log(data.text);
if (data.done) break;
}
}
}
```
### cURL
```bash
curl -X POST "https://your-space-url.hf.space/stream" \
-H "Content-Type: application/json" \
-d '{"prompt": "Hello, how are you?", "key": "manus-ollama-2024"}' \
--no-buffer
```
## ⚑ Performance Optimizations
- **Async I/O**: Full async/await architecture for non-blocking operations
- **Connection pooling**: Reusable HTTP connections with httpx
- **Streaming**: Real-time token streaming with minimal latency
- **Model caching**: Model preloaded on startup
- **Optimized parameters**: Tuned temperature, top_k, and top_p for speed
## πŸ”’ Security
- Connect key authentication required for all streaming endpoints
- CORS enabled for browser access
- Input validation on all requests
## πŸ“Š Model Information
- **Model**: deepseek-r1:1.5b
- **Size**: ~1.5B parameters
- **Optimized for**: Fast inference and low latency
- **Max tokens**: 2048 per request
## πŸ› οΈ Development
Built with:
- FastAPI 0.109.0
- Ollama (latest)
- Python 3.11
- Uvicorn ASGI server
## πŸ“ License
MIT License