File size: 9,273 Bytes
40e1a91
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
# W&B MCP Server - Architecture & Scalability Guide

## Table of Contents
1. [Architecture Decision](#architecture-decision)
2. [Stateless HTTP Design](#stateless-http-design)
3. [Performance & Scalability](#performance--scalability)
4. [Load Test Results](#load-test-results)
5. [Deployment Recommendations](#deployment-recommendations)

---

## Architecture Decision

### Decision: Pure Stateless HTTP Mode

**The W&B MCP Server uses pure stateless HTTP mode (`stateless_http=True`).**

This fundamental architecture decision enables:
- βœ… **Universal client compatibility** (OpenAI, Cursor, LeChat, Claude)
- βœ… **Horizontal scaling** capabilities
- βœ… **Simpler operations** and maintenance
- βœ… **Cloud-native** deployment patterns

### Why Stateless?

The Model Context Protocol traditionally used stateful sessions, but this created issues:

| Client | Behavior | Problem with Stateful |
|--------|----------|----------------------|
| **OpenAI** | Deletes session after listing tools, then reuses ID | Session not found errors |
| **Cursor** | Sends Bearer token with every request | Expects stateless behavior |
| **Claude** | Can work with either model | No issues |

### The Solution

```python
# Pure stateless operation - no session persistence
mcp = FastMCP("wandb-mcp-server", stateless_http=True)
```

With this approach:
- **Session IDs are correlation IDs only** - they match requests to responses
- **No state persists between requests** - each request is independent
- **Authentication required per request** - Bearer token must be included
- **Any worker can handle any request** - enables horizontal scaling

---

## Stateless HTTP Design

### Architecture Overview

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚    MCP Clients (OpenAI/Cursor/etc)  β”‚
β”‚     Bearer Token with Each Request   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
              β”‚ HTTPS
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         Load Balancer (Optional)     β”‚
β”‚      Round-Robin Distribution        β”‚
β””β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
   β”‚          β”‚          β”‚
β”Œβ”€β”€β–Όβ”€β”€β”€β”  β”Œβ”€β”€β–Όβ”€β”€β”€β”  β”Œβ”€β”€β–Όβ”€β”€β”€β”
β”‚ W1   β”‚  β”‚ W2   β”‚  β”‚ W3   β”‚  (Multiple Workers Possible)
β”‚      β”‚  β”‚      β”‚  β”‚      β”‚
β”‚ ASGI β”‚  β”‚ ASGI β”‚  β”‚ ASGI β”‚  Uvicorn/Gunicorn
β””β”€β”€β”¬β”€β”€β”€β”˜  β””β”€β”€β”¬β”€β”€β”€β”˜  β””β”€β”€β”¬β”€β”€β”€β”˜
   β”‚          β”‚          β”‚
β”Œβ”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         FastAPI Application         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚
β”‚  β”‚  Stateless Auth Middleware  β”‚     β”‚
β”‚  β”‚  (Bearer Token Validation)  β”‚     β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚
β”‚  β”‚    MCP Stateless Handler    β”‚     β”‚
β”‚  β”‚  (No Session Storage)       β”‚     β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
              β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         W&B API Integration         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

### Request Flow

1. **Client sends request** with Bearer token and session ID
2. **Middleware validates** Bearer token
3. **MCP processes** request (session ID used for correlation only)
4. **Response sent** with matching session ID
5. **No state persisted** - request complete

### Key Implementation Details

```python
async def thread_safe_auth_middleware(request: Request, call_next):
    """Stateless authentication middleware."""
    
    # Session IDs are correlation IDs only
    session_id = request.headers.get("Mcp-Session-Id")
    if session_id:
        logger.debug(f"Correlation ID: {session_id[:8]}...")
    
    # Every request must have Bearer token
    authorization = request.headers.get("Authorization", "")
    if authorization.startswith("Bearer "):
        api_key = authorization[7:].strip()
        # Use API key for this request only
        # No session storage or retrieval
```

---

## Performance & Scalability

### Single Worker Performance

Based on testing with stateless mode:

| Metric | Local Server | Remote (HF Spaces) |
|--------|--------------|-------------------|
| **Max Concurrent** | 1000 clients | 500+ clients |
| **Throughput** | ~50-60 req/s | ~35 req/s |
| **Latency (p50)** | <500ms | <2s |
| **Memory Usage** | 200-500MB | 300-600MB |

### Horizontal Scaling Potential

With stateless mode, the server supports true horizontal scaling:

| Workers | Max Concurrent | Total Throughput | Notes |
|---------|----------------|------------------|-------|
| 1 | 1000 | ~50 req/s | Current deployment |
| 2 | 2000 | ~100 req/s | Linear scaling |
| 4 | 4000 | ~200 req/s | Near-linear |
| 8 | 8000 | ~400 req/s | Some overhead |

**Key Advantage**: No session affinity required - any worker can handle any request!

---

## Load Test Results

### Latest Test Results (2025-09-25)

#### Local Server (MacOS, Single Worker)

| Concurrent Clients | Success Rate | Throughput | Mean Response |
|--------------------|-------------|------------|---------------|
| 10 | 100% | 47 req/s | 89ms |
| 100 | 100% | 47 req/s | 1.2s |
| 500 | 100% | 56 req/s | 4.4s |
| **1000** | **100%** | **48 req/s** | **9.3s** |
| 1500 | 80% | 51 req/s | 15.4s |
| 2000 | 70% | 53 req/s | 20.8s |

**Breaking Point**: ~1500 concurrent connections

#### Remote Server (mcp.withwandb.com)

| Concurrent Clients | Success Rate | Throughput | Mean Response |
|--------------------|-------------|------------|---------------|
| 10 | 100% | 10 req/s | 0.8s |
| 50 | 100% | 29 req/s | 1.2s |
| 100 | 100% | 33 req/s | 1.9s |
| 200 | 100% | 34 req/s | 3.3s |
| **500** | **100%** | **35 req/s** | **7.5s** |

**Key Finding**: Remote server handles 500+ concurrent connections reliably!

### Performance Sweet Spots

1. **Low Latency** (<1s response): Use ≀50 concurrent connections
2. **Balanced** (good throughput & latency): Use 100-200 concurrent connections  
3. **Maximum Throughput**: Use 200-300 concurrent connections
4. **Maximum Capacity**: Up to 500 concurrent (remote) or 1000 (local)

---

## Deployment Recommendations

### Current Deployment (HuggingFace Spaces)

```yaml
Configuration:
  - Single worker (can be increased)
  - Stateless HTTP mode
  - 2 vCPU, 16GB RAM
  - Port 7860

Performance:
  - 500+ concurrent connections
  - ~35 req/s throughput
  - 100% reliability up to 500 concurrent
```

### Scaling Options

#### Option 1: Vertical Scaling
- Increase CPU/RAM on HuggingFace Spaces
- Can improve single-worker throughput

#### Option 2: Horizontal Scaling (Recommended)
```python
# app.py - Enable multiple workers
uvicorn.run(app, host="0.0.0.0", port=PORT, workers=4)
```

#### Option 3: Multi-Region Deployment
- Deploy to multiple regions
- Use global load balancer
- Reduce latency for users worldwide

### Production Checklist

βœ… **Stateless mode enabled** (`stateless_http=True`)  
βœ… **Bearer authentication** on every request  
βœ… **Health check endpoint** (`/health`)  
βœ… **Monitoring** for response times and errors  
βœ… **Rate limiting** (recommended: 100 req/s per client)  
βœ… **Connection limits** (recommended: 500 concurrent)  

### Configuration Example

```python
# Production configuration
mcp = FastMCP("wandb-mcp-server", stateless_http=True)

# Uvicorn with multiple workers (if needed)
if __name__ == "__main__":
    uvicorn.run(
        app,
        host="0.0.0.0",
        port=7860,
        workers=1,  # Increase for horizontal scaling
        limit_concurrency=1000,  # Connection limit
        timeout_keep_alive=30,  # Keepalive timeout
    )
```

### Security Considerations

1. **API Key Validation**: Every request validates Bearer token
2. **No Session Storage**: No risk of session hijacking
3. **Rate Limiting**: Protect against abuse
4. **HTTPS Only**: Always use TLS in production
5. **Token Rotation**: Encourage regular API key rotation

---

## Summary

The W&B MCP Server's stateless architecture provides:

- **Universal Compatibility**: Works with all MCP clients
- **Excellent Performance**: 500+ concurrent connections, ~35 req/s
- **Horizontal Scalability**: Add workers to increase capacity
- **Simple Operations**: No session management complexity
- **Production Ready**: Deployed and tested at scale

The stateless design is not a compromise - it's the optimal architecture for MCP servers in production environments.