api / backend
384 kB
gary-boon
Limit QKV matrices to top 5 heads per layer to reduce response size
decb5ab