Velvet-2B (4-bit Quantized)
This is a 4-bit quantized version of Almawave/Velvet-2B using bitsandbytes.
Model Details
- Base Model: Almawave/Velvet-2B
- Quantization: 4-bit (nf4) with bitsandbytes
- Compute Dtype: float16
- Double Quantization: Enabled
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch
model_id = "Obiactum/Velvet-2B-4bit"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True,
)
model = AutoModelForCausalLM.from_pretrained(
model_id,
quantization_config=bnb_config,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
Benefits
- Reduced memory usage (approximately 75% reduction)
- Faster inference on compatible hardware
- Maintains reasonable model performance
license: Apache 2.0 license
base_model: Almawave/Velvet-2B
tags:
- 4bit
- bitsandbytes
- quantized
- Velvet-2B
- Downloads last month
- 25
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
馃檵
Ask for provider support
Model tree for Obiactum/Velvet-2B-4bit
Base model
Almawave/Velvet-2B