This is great stuff and necessary.
#1
by 1TBGPU4EVR - opened
We need lighter models that can perform faster and function as modular parts in real-time workflows. I can't wait to try this with Flash Attention.
We need lighter models that can perform faster and function as modular parts in real-time workflows. I can't wait to try this with Flash Attention.