--- library_name: transformers tags: [] ---

Bonsai Logo

Bonsai: A Small Ternary-Weight Language Model

## Model Details ### Model Description Bonsai is a small 500 million parameter ternary weight language model trained by deepgrove. Bonsai adopts the Llama architecture and Mistral tokenizer following [Danube 3](https://arxiv.org/pdf/2407.09276v1), with modified linear layers to support ternary weights. The model has been trained primarily using DCLM-Pro and Fineweb-Edu. Bonsai marks a new paradigm of efficiency, being trained in less than 5 billion tokens. - **Developed by:** deepgrove - **Language(s) (NLP):** English - **License:** Apache-2 - **Repository:** https://github.com/deepgrove-ai/Bonsai - **Paper:** https://github.com/deepgrove-ai/Bonsai/tree/main/paper/Bonsai.pdf ## Usage Bonsai can be easily used through the Huggingface Transformers library. However, we note that all operations are currently performed in 16 bit precision; we're currently working towards integrating our model design with custom mixed precision kernels. A quick example follows: ```{python} from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("deepgrove/Bonsai", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("deepgrove/Bonsai", trust_remote_code=True) text = "What is the capital of France?" inputs = tokenizer(text, return_tensors="pt") outputs = model.generate(**inputs, max_length=100) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` We note that Bonsai is not instruction tuned; we highly recommend finetuning the model before usage in a downstream task. ## Evaluation Bonsai achieves competitive performance among its peers, being one of the first ternary models to do so. Evalution results are below; for more detailed results and comparisons to other ternary models, please see the accompanying paper linked above. We use lm-eval for all benchmarks outside of MMLU and lighteval's cloze formulation for MMLU.

| Model | ARC-c | ARC-e | HS. | OBQA | PiQA | Wino. | MMLU | Avg | |-------|--------|--------|------|-------|-------|--------|-------|-----| | MobiLlama 0.5B | 26.62 | 46.68 | 51.66 | 30.00 | 71.65 | 54.50 | 28.61 | 44.25 | | Qwen 2 0.5B | 28.84 | 50.29 | 49.12 | 33.00 | 69.26 | 56.99 | 31.78 | 45.61 | | MobileLLM 600M | 29.01 | 56.65 | 55.35 | 34.00 | 71.65 | 59.75 | 31.40 | 48.13 | | Qwen 2.5 0.5B | 32.25 | 58.29 | 52.18 | 35.40 | 69.91 | 56.12 | 33.40 | 48.22 | | **Bonsai** | 33.36 | 57.95 | 48.04 | 34.00 | 70.24 | 54.85 | 30.28 | 46.96 |