Cache causal mask for faster inference 89ad9ef Running verified LisaMegaWatts commited on 30 days ago
Fix completion_tokens: count tokens not decoded characters 80c1b86 verified LisaMegaWatts commited on Feb 26
Switch to HTTP.serve (non-streaming handler) for proxy compatibility d353df1 verified LisaMegaWatts commited on Feb 25
Fix: add Content-Length for non-streaming responses, X-Accel-Buffering for SSE 6c7eeb4 verified LisaMegaWatts commited on Feb 25