Bonsai-4B — Unpacked FP16 Safetensors

FP16 safetensors (HuggingFace format) of the 1-bit Bonsai-4B model. This repo exists for users who want to run Bonsai with stock HuggingFace tooling or frameworks that don't yet support 1-bit weights natively. The 1-bit kernels are currently in our forks of MLX and llama.cpp — once they land upstream, this unpacked version will no longer be needed.

We strongly recommend using the native 1-bit models instead. The 1-bit format is where all the benefits of Bonsai come from — up to 14x memory reduction, 4x faster inference, and lower energy per token. This unpacked FP16 version is full-size and does not provide any of those advantages.

For the optimized 1-bit release models (recommended):

Bonsai-4B GGUF Q1_0_g128 — 1-bit GGUF for llama.cpp (CUDA, Metal, CPU)
Bonsai-4B MLX 1-bit — 1-bit MLX for Apple Silicon

Downloads last month: -

Safetensors

Model size

4B params

Tensor type

F16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support