Bonsai-Auxiliary
Collection
3 items β’ Updated β’ 3
FP16 safetensors (HuggingFace format) of the 1-bit Bonsai-8B model. This repo exists for users who want to run Bonsai with stock HuggingFace tooling or frameworks that don't yet support 1-bit weights natively. The 1-bit kernels are currently in our forks of MLX and llama.cpp β once they land upstream, this unpacked version will no longer be needed.
We strongly recommend using the native 1-bit models instead. The 1-bit format is where all the benefits of Bonsai come from β up to 14x memory reduction, 6x faster inference, and 5x lower energy per token. This unpacked FP16 version is full-size and does not provide any of those advantages.
For the optimized 1-bit release models (recommended):