Quantization support.

by AV99 - opened May 21, 2023

Discussion

AV99

May 21, 2023

•

edited May 21, 2023

Are there any plans of releasing 8bit versions support for this?

Verah

May 25, 2023

Add _no_split_modules = ["CodeT5pBlock"] to class CodeT5pEncoderDecoderModel in modeling_codet5p.py and now device_map="auto" should work. now you can just use bitsandbytes to do 8bit inference, which will let you run this model with a 24gb gpu.
model = transformers.AutoModelForSeq2SeqLM.from_pretrained(checkpoint, device_map="auto", load_in_8bit=True, low_cpu_mem_usage=True, trust_remote_code=True)

If you are a windows user you can find a bnb build here: https://github.com/acpopescu/bitsandbytes/releases

thechashi

May 31, 2023

Hey Verah, For https://huggingface.co/mosaicml/mpt-7b-instruct where should I add _no_split_modules, and what will be the value?

Thanks in advance.

sanshi2023

Jun 11, 2023

Are there any plans of releasing 4bit versions support for this? Thanks.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment