Buckets:

hf-doc-build
/

doc

Files

xet

hf-doc-build/doc / bitsandbytes /main /en /quickstart.md

HuggingFaceDocBuilder

3 days ago

preview code

download

raw

4.03 kB

	# Quickstart

	Welcome to bitsandbytes! This library enables accessible large language models via k-bit quantization for PyTorch, dramatically reducing memory consumption for inference and training.

	## Installation

	```bash
	pip install bitsandbytes
	```

	Requirements: Python 3.10+, PyTorch 2.3+

	For detailed installation instructions, see the [Installation Guide](./installation).

	## What is bitsandbytes?

	bitsandbytes provides three main features:

	- LLM.int8(): 8-bit quantization for inference (50% memory reduction)
	- QLoRA: 4-bit quantization for training (75% memory reduction)
	- 8-bit Optimizers: Memory-efficient optimizers for training

	## Quick Examples

	### 8-bit Inference

	Load and run a model using 8-bit quantization:

	```py
	from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

	model = AutoModelForCausalLM.from_pretrained(
	"meta-llama/Llama-2-7b-hf",
	device_map="auto",
	quantization_config=BitsAndBytesConfig(load_in_8bit=True),
	)

	tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf")
	inputs = tokenizer("Hello, my name is", return_tensors="pt").to("cuda")
	outputs = model.generate(**inputs, max_new_tokens=20)
	print(tokenizer.decode(outputs[0]))
	```

	> Learn more: See the [Integrations guide](./integrations) for more details on using bitsandbytes with Transformers.

	### 4-bit Quantization

	For even greater memory savings:

	```py
	import torch
	from transformers import AutoModelForCausalLM, BitsAndBytesConfig

	bnb_config = BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_compute_dtype=torch.bfloat16,
	bnb_4bit_quant_type="nf4",
	)

	model = AutoModelForCausalLM.from_pretrained(
	"meta-llama/Llama-2-7b-hf",
	quantization_config=bnb_config,
	device_map="auto",
	)
	```

	### QLoRA Fine-tuning

	Combine 4-bit quantization with LoRA for efficient training:

	```py
	from transformers import AutoModelForCausalLM, BitsAndBytesConfig
	from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training

	# Load 4-bit model
	bnb_config = BitsAndBytesConfig(load_in_4bit=True)
	model = AutoModelForCausalLM.from_pretrained(
	"meta-llama/Llama-2-7b-hf",
	quantization_config=bnb_config,
	)

	# Prepare for training
	model = prepare_model_for_kbit_training(model)

	# Add LoRA adapters
	lora_config = LoraConfig(
	r=16,
	lora_alpha=32,
	target_modules=["q_proj", "v_proj"],
	task_type="CAUSAL_LM",
	)
	model = get_peft_model(model, lora_config)

	# Now train with your preferred trainer
	```

	> Learn more: See the [FSDP-QLoRA guide](./fsdp_qlora) for advanced training techniques and the [Integrations guide](./integrations) for using with PEFT.

	### 8-bit Optimizers

	Use 8-bit optimizers to reduce training memory by 75%:

	```py
	import bitsandbytes as bnb

	model = YourModel()

	# Replace standard optimizer with 8-bit version
	optimizer = bnb.optim.Adam8bit(model.parameters(), lr=1e-3)

	# Use in training loop as normal
	for batch in dataloader:
	loss = model(batch)
	loss.backward()
	optimizer.step()
	optimizer.zero_grad()
	```

	> Learn more: See the [8-bit Optimizers guide](./optimizers) for detailed usage and configuration options.

	### Custom Quantized Layers

	Use quantized linear layers directly in your models:

	```py
	import torch
	import bitsandbytes as bnb

	# 8-bit linear layer
	linear_8bit = bnb.nn.Linear8bitLt(1024, 1024, has_fp16_weights=False)

	# 4-bit linear layer
	linear_4bit = bnb.nn.Linear4bit(1024, 1024, compute_dtype=torch.bfloat16)
	```

	## Next Steps

	- [8-bit Optimizers Guide](./optimizers) - Detailed optimizer usage
	- [FSDP-QLoRA](./fsdp_qlora) - Train 70B+ models on consumer GPUs
	- [Integrations](./integrations) - Use with Transformers, PEFT, Accelerate
	- [FAQs](./faqs) - Common questions and troubleshooting

	## Getting Help

	- Check the [FAQs](./faqs) and [Common Errors](./errors)
	- Visit [official documentation](https://huggingface.co/docs/bitsandbytes)
	- Open an issue on [GitHub](https://github.com/bitsandbytes-foundation/bitsandbytes/issues)

Xet Storage Details

Size:: 4.03 kB
Xet hash:: ff1936269d752557d3cbc7949667af9c5290fdf48e600746f667f4dfef5a8cd4

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.