Regarding the Model size

by Prakh24s - opened Dec 5, 2023

Dec 5, 2023

Thank you for the amazing paper and model weights.

The model seems to be twice the size compared transformer based model for the same size (~5.9 GB for 3b transformer model vs 11.1GB Mamba model).
Is is expected?

mamba-checkpoints

State Space Models org Dec 5, 2023

This comment has been hidden

benxh

Dec 5, 2023

It's a float32 model, hence the size difference. Transformers are usually float16 or bfloat16.

Prakh24s

Dec 6, 2023

Thank you for the answer!
Very excited for bigger/quantized models!

Prakh24s changed discussion status to closed Dec 6, 2023

eccstartup

Jan 27, 2024

One more question: will float16 model still outperform Transformers as said in the paper?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment