|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- it |
|
|
- en |
|
|
base_model: |
|
|
- Almawave/Velvet-2B |
|
|
--- |
|
|
# Velvet-2B (4-bit Quantized) |
|
|
|
|
|
This is a 4-bit quantized version of [Almawave/Velvet-2B](https://huggingface.co/Almawave/Velvet-2B) using bitsandbytes. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Base Model**: Almawave/Velvet-2B |
|
|
- **Quantization**: 4-bit (nf4) with bitsandbytes |
|
|
- **Compute Dtype**: float16 |
|
|
- **Double Quantization**: Enabled |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig |
|
|
import torch |
|
|
|
|
|
model_id = "Obiactum/Velvet-2B-4bit" |
|
|
|
|
|
bnb_config = BitsAndBytesConfig( |
|
|
load_in_4bit=True, |
|
|
bnb_4bit_quant_type="nf4", |
|
|
bnb_4bit_compute_dtype=torch.float16, |
|
|
bnb_4bit_use_double_quant=True, |
|
|
) |
|
|
|
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
model_id, |
|
|
quantization_config=bnb_config, |
|
|
device_map="auto" |
|
|
) |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
|
``` |
|
|
|
|
|
## Benefits |
|
|
|
|
|
- Reduced memory usage (approximately 75% reduction) |
|
|
- Faster inference on compatible hardware |
|
|
- Maintains reasonable model performance |
|
|
|
|
|
|
|
|
``` |
|
|
license: Apache 2.0 license |
|
|
base_model: Almawave/Velvet-2B |
|
|
tags: |
|
|
- 4bit |
|
|
- bitsandbytes |
|
|
- quantized |
|
|
- Velvet-2B |
|
|
``` |