KV CACHE
#47
by Indrajeet040 - opened
I want to know if I serve this model using VLLM on GPU server, so it will take the attention layer to be 48 or 12.
As it is mentioned in model that it has only 12 attention layer.
Let's assume I am running this model on 2 h200 server with 128k context.