deepgrove
/

Bonsai

Text Generation

text-generation-inference

Model card Files Files and versions

Bonsai / README.md

phh's picture

Fix model path in the transformers example code

d9fa61d verified 12 months ago

|

3 kB

	---
	library_name: transformers
	tags: []
	---

	<p align="center">
	<img src="figs/bonsai.png" width="200" alt="Bonsai Logo">

	<h3 align="center" style="font-size: 30px">Bonsai: A Small Ternary-Weight Language Model</h3>
	</p>

	## Model Details

	### Model Description

	<!-- Provide a longer summary of what this model is. -->

	Bonsai is a small 500 million parameter ternary weight language model trained by deepgrove. Bonsai adopts the Llama architecture and Mistral tokenizer following [Danube 3](https://arxiv.org/pdf/2407.09276v1), with modified linear layers to support ternary weights. The model has been trained primarily using DCLM-Pro and Fineweb-Edu. Bonsai marks a new paradigm of efficiency, being trained in less than 5 billion tokens.

	- Developed by: deepgrove
	- Language(s) (NLP): English
	- License: Apache-2
	- Repository: https://github.com/deepgrove-ai/Bonsai
	- Paper: https://github.com/deepgrove-ai/Bonsai/tree/main/paper/Bonsai.pdf

	## Usage

	<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

	Bonsai can be easily used through the Huggingface Transformers library. However, we note that all operations are currently performed in 16 bit precision; we're currently working towards integrating our model design with custom mixed precision kernels. A quick example follows:

	```{python}
	from transformers import AutoTokenizer, AutoModelForCausalLM

	tokenizer = AutoTokenizer.from_pretrained("deepgrove/Bonsai", trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained("deepgrove/Bonsai", trust_remote_code=True)
	text = "What is the capital of France?"
	inputs = tokenizer(text, return_tensors="pt")
	outputs = model.generate(**inputs, max_length=100)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```
	We note that Bonsai is not instruction tuned; we highly recommend finetuning the model before usage in a downstream task.

	## Evaluation

	<!-- This section describes the evaluation protocols and provides the results. -->

	Bonsai achieves competitive performance among its peers, being one of the first ternary models to do so. Evalution results are below; for more detailed results and comparisons to other ternary models, please see the accompanying paper linked above. We use lm-eval for all benchmarks outside of MMLU and lighteval's cloze formulation for MMLU.

	<div align="center">

	\| Model \| ARC-c \| ARC-e \| HS. \| OBQA \| PiQA \| Wino. \| MMLU \| Avg \|
	\|-------\|--------\|--------\|------\|-------\|-------\|--------\|-------\|-----\|
	\| MobiLlama 0.5B \| 26.62 \| 46.68 \| 51.66 \| 30.00 \| 71.65 \| 54.50 \| 28.61 \| 44.25 \|
	\| Qwen 2 0.5B \| 28.84 \| 50.29 \| 49.12 \| 33.00 \| 69.26 \| 56.99 \| 31.78 \| 45.61 \|
	\| MobileLLM 600M \| 29.01 \| 56.65 \| 55.35 \| 34.00 \| 71.65 \| 59.75 \| 31.40 \| 48.13 \|
	\| Qwen 2.5 0.5B \| 32.25 \| 58.29 \| 52.18 \| 35.40 \| 69.91 \| 56.12 \| 33.40 \| 48.22 \|
	\| Bonsai \| 33.36 \| 57.95 \| 48.04 \| 34.00 \| 70.24 \| 54.85 \| 30.28 \| 46.96 \|