oki692 commited on
Commit
0e8027f
Β·
verified Β·
1 Parent(s): 350da8b

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +152 -5
README.md CHANGED
@@ -1,10 +1,157 @@
1
  ---
2
- title: Ollama Fastapi Streaming
3
- emoji: πŸŒ–
4
- colorFrom: pink
5
- colorTo: green
6
  sdk: docker
7
  pinned: false
 
8
  ---
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Ollama FastAPI Streaming Server
3
+ emoji: πŸš€
4
+ colorFrom: blue
5
+ colorTo: purple
6
  sdk: docker
7
  pinned: false
8
+ license: mit
9
  ---
10
 
11
+ # Ollama FastAPI Real-Time Streaming Server
12
+
13
+ Fast and optimized FastAPI server with Ollama for real-time streaming inference using **deepseek-r1:1.5b** model.
14
+
15
+ ## πŸ”‘ Authentication
16
+
17
+ All streaming requests require a connect key: `manus-ollama-2024`
18
+
19
+ ## πŸ“‘ API Endpoints
20
+
21
+ ### GET `/`
22
+ Health check endpoint returning service status and endpoint URL.
23
+
24
+ **Response:**
25
+ ```json
26
+ {
27
+ "status": "online",
28
+ "model": "deepseek-r1:1.5b",
29
+ "endpoint": "https://your-space-url.hf.space"
30
+ }
31
+ ```
32
+
33
+ ### POST `/stream`
34
+ Real-time streaming chat completions.
35
+
36
+ **Request:**
37
+ ```json
38
+ {
39
+ "prompt": "Explain quantum computing",
40
+ "key": "manus-ollama-2024"
41
+ }
42
+ ```
43
+
44
+ **Response:** Server-Sent Events (SSE) stream
45
+ ```
46
+ data: {"text": "Quantum", "done": false}
47
+ data: {"text": " computing", "done": false}
48
+ data: {"text": " is...", "done": true}
49
+ ```
50
+
51
+ ### GET `/models`
52
+ List available models.
53
+
54
+ **Response:**
55
+ ```json
56
+ {
57
+ "models": ["deepseek-r1:1.5b"],
58
+ "default": "deepseek-r1:1.5b"
59
+ }
60
+ ```
61
+
62
+ ### GET `/health`
63
+ Detailed health check with Ollama connection status.
64
+
65
+ ## πŸš€ Usage Example
66
+
67
+ ### Python with httpx
68
+ ```python
69
+ import httpx
70
+ import json
71
+
72
+ url = "https://your-space-url.hf.space/stream"
73
+ payload = {
74
+ "prompt": "What is artificial intelligence?",
75
+ "key": "manus-ollama-2024"
76
+ }
77
+
78
+ with httpx.stream("POST", url, json=payload, timeout=300) as response:
79
+ for line in response.iter_lines():
80
+ if line.startswith("data: "):
81
+ data = json.loads(line[6:])
82
+ print(data.get("text", ""), end="", flush=True)
83
+ if data.get("done"):
84
+ break
85
+ ```
86
+
87
+ ### JavaScript/TypeScript
88
+ ```javascript
89
+ const response = await fetch('https://your-space-url.hf.space/stream', {
90
+ method: 'POST',
91
+ headers: { 'Content-Type': 'application/json' },
92
+ body: JSON.stringify({
93
+ prompt: 'What is artificial intelligence?',
94
+ key: 'manus-ollama-2024'
95
+ })
96
+ });
97
+
98
+ const reader = response.body.getReader();
99
+ const decoder = new TextDecoder();
100
+
101
+ while (true) {
102
+ const { done, value } = await reader.read();
103
+ if (done) break;
104
+
105
+ const text = decoder.decode(value);
106
+ const lines = text.split('\n');
107
+
108
+ for (const line of lines) {
109
+ if (line.startsWith('data: ')) {
110
+ const data = JSON.parse(line.slice(6));
111
+ console.log(data.text);
112
+ if (data.done) break;
113
+ }
114
+ }
115
+ }
116
+ ```
117
+
118
+ ### cURL
119
+ ```bash
120
+ curl -X POST "https://your-space-url.hf.space/stream" \
121
+ -H "Content-Type: application/json" \
122
+ -d '{"prompt": "Hello, how are you?", "key": "manus-ollama-2024"}' \
123
+ --no-buffer
124
+ ```
125
+
126
+ ## ⚑ Performance Optimizations
127
+
128
+ - **Async I/O**: Full async/await architecture for non-blocking operations
129
+ - **Connection pooling**: Reusable HTTP connections with httpx
130
+ - **Streaming**: Real-time token streaming with minimal latency
131
+ - **Model caching**: Model preloaded on startup
132
+ - **Optimized parameters**: Tuned temperature, top_k, and top_p for speed
133
+
134
+ ## πŸ”’ Security
135
+
136
+ - Connect key authentication required for all streaming endpoints
137
+ - CORS enabled for browser access
138
+ - Input validation on all requests
139
+
140
+ ## πŸ“Š Model Information
141
+
142
+ - **Model**: deepseek-r1:1.5b
143
+ - **Size**: ~1.5B parameters
144
+ - **Optimized for**: Fast inference and low latency
145
+ - **Max tokens**: 2048 per request
146
+
147
+ ## πŸ› οΈ Development
148
+
149
+ Built with:
150
+ - FastAPI 0.109.0
151
+ - Ollama (latest)
152
+ - Python 3.11
153
+ - Uvicorn ASGI server
154
+
155
+ ## πŸ“ License
156
+
157
+ MIT License