Llama 3.2 · 1B
KVInfer Studio
Idle

KVInfer Studio

Fine-tuned Llama 3.2 1B running on a hand-written C++ inference engine — AVX2 SIMD, OpenMP, RoPE, GQA, SwiGLU, persistent KV-cache.

1B params RoPE GQA 8 heads SwiGLU AVX2 SIMD KV Cache
Enter ↵ to send · Shift+Enter for newline