Llama 3.2 · 1B
KVInfer Studio
Idle
1B
KVInfer Studio
Fine-tuned Llama 3.2 1B running on a hand-written C++ inference engine — AVX2 SIMD, OpenMP, RoPE, GQA, SwiGLU, persistent KV-cache.
1B params
RoPE
GQA 8 heads
SwiGLU
AVX2 SIMD
KV Cache