Recommended server endpoint
#49
by RonanMcGovern - opened
I note that prefix attention is not supported by vLLM yet?
Is there a recommended inference library?
I note that prefix attention is not supported by vLLM yet?
Is there a recommended inference library?