When will be able to provide a 4, 8bit quantized version？

by fukai - opened Aug 14, 2024

Discussion

fukai

Aug 14, 2024

Consuming too many resources, often oom

CHONGYOEYAT

Jan 1, 2025

from io import BytesIO
from urllib.request import urlopen
import librosa
from transformers import Qwen2AudioForConditionalGeneration, AutoProcessor
from transformers import BitsAndBytesConfig
import torch

Configure 4-bit quantization

quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
)

processor = AutoProcessor.from_pretrained("Qwen/Qwen2-Audio-7B-Instruct")
model = Qwen2AudioForConditionalGeneration.from_pretrained("Qwen/Qwen2-Audio-7B-Instruct", device_map="auto",quantization_config=quantization_config)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment