Obiactum
/

Velvet-2B-4bit

4-bit precision

Model card Files Files and versions

Velvet-2B-4bit / README.md

Obiactum's picture

Update README.md

7c92e77 verified 3 months ago

|

history blame contribute delete

1.17 kB

	---
	license: apache-2.0
	language:
	- it
	- en
	base_model:
	- Almawave/Velvet-2B
	---
	# Velvet-2B (4-bit Quantized)

	This is a 4-bit quantized version of [Almawave/Velvet-2B](https://huggingface.co/Almawave/Velvet-2B) using bitsandbytes.

	## Model Details

	- Base Model: Almawave/Velvet-2B
	- Quantization: 4-bit (nf4) with bitsandbytes
	- Compute Dtype: float16
	- Double Quantization: Enabled

	## Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
	import torch

	model_id = "Obiactum/Velvet-2B-4bit"

	bnb_config = BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_quant_type="nf4",
	bnb_4bit_compute_dtype=torch.float16,
	bnb_4bit_use_double_quant=True,
	)

	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	quantization_config=bnb_config,
	device_map="auto"
	)

	tokenizer = AutoTokenizer.from_pretrained(model_id)
	```

	## Benefits

	- Reduced memory usage (approximately 75% reduction)
	- Faster inference on compatible hardware
	- Maintains reasonable model performance


	```
	license: Apache 2.0 license
	base_model: Almawave/Velvet-2B
	tags:
	- 4bit
	- bitsandbytes
	- quantized
	- Velvet-2B
	```