Andrew McCracken Claude commited on
Commit
6b0a701
·
1 Parent(s): efd4459

Configure uvicorn for concurrent request handling

Browse files

Updated uvicorn settings for optimal concurrency:
- workers=1: Share model pool across all requests (can't share across processes)
- limit_concurrency=100: Handle up to 100 simultaneous connections
- timeout_keep_alive=120: Support long streaming responses
- backlog=2048: Queue pending connections
- loop='asyncio': Best async performance

With these settings + ModelPool, the app can:
- Accept 100 concurrent connections
- Process 10 simultaneous inferences (model pool size)
- Queue remaining requests gracefully

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

Files changed (1) hide show
  1. main.py +11 -2
main.py CHANGED
@@ -564,10 +564,19 @@ async def serve_test_interface():
564
  if __name__ == "__main__":
565
  import uvicorn
566
 
567
- uvicorn.run(
 
568
  app,
569
  host="0.0.0.0",
570
  port=8000,
571
  log_level="info",
572
- access_log=True
 
 
 
 
 
573
  )
 
 
 
 
564
  if __name__ == "__main__":
565
  import uvicorn
566
 
567
+ # Configure uvicorn for concurrent request handling
568
+ config = uvicorn.Config(
569
  app,
570
  host="0.0.0.0",
571
  port=8000,
572
  log_level="info",
573
+ access_log=True,
574
+ workers=1, # Single worker to share model pool across all requests
575
+ limit_concurrency=100, # Allow up to 100 concurrent connections
576
+ timeout_keep_alive=120, # Keep connections alive for streaming
577
+ backlog=2048, # Queue up to 2048 pending connections
578
+ loop="asyncio" # Use asyncio event loop for best async performance
579
  )
580
+
581
+ server = uvicorn.Server(config)
582
+ server.run()