Quantized Model
This model has been quantized using bitsandbytes (4bit quantization).
Loading the Model
To load this quantized model, you need to specify the quantization configuration:
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch
# Configure quantization
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
)
# Load model
model = AutoModelForCausalLM.from_pretrained(
"WethosAI/llama3.3_70B_stu_persona_verbose_4bit",
quantization_config=quantization_config,
device_map="auto",
torch_dtype=torch.bfloat16,
)
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("WethosAI/llama3.3_70B_stu_persona_verbose_4bit")
Requirements
- transformers
- bitsandbytes
- torch (with CUDA support for quantization)
- peft (if this was originally a LoRA adapter)
Notes
- This model uses 4bit quantization to reduce memory usage
- The quantization config is saved in the model's config.json
- Make sure to load with the quantization_config parameter to use the quantized weights
- Quantized models require CUDA-enabled GPUs to run efficiently
- Downloads last month
- 2