Buckets:
| # Quickstart | |
| Welcome to bitsandbytes! This library enables accessible large language models via k-bit quantization for PyTorch, dramatically reducing memory consumption for inference and training. | |
| ## Installation | |
| ```bash | |
| pip install bitsandbytes | |
| ``` | |
| **Requirements:** Python 3.10+, PyTorch 2.3+ | |
| For detailed installation instructions, see the [Installation Guide](./installation). | |
| ## What is bitsandbytes? | |
| bitsandbytes provides three main features: | |
| - **LLM.int8()**: 8-bit quantization for inference (50% memory reduction) | |
| - **QLoRA**: 4-bit quantization for training (75% memory reduction) | |
| - **8-bit Optimizers**: Memory-efficient optimizers for training | |
| ## Quick Examples | |
| ### 8-bit Inference | |
| Load and run a model using 8-bit quantization: | |
| ```py | |
| from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig | |
| model = AutoModelForCausalLM.from_pretrained( | |
| "meta-llama/Llama-2-7b-hf", | |
| device_map="auto", | |
| quantization_config=BitsAndBytesConfig(load_in_8bit=True), | |
| ) | |
| tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf") | |
| inputs = tokenizer("Hello, my name is", return_tensors="pt").to("cuda") | |
| outputs = model.generate(**inputs, max_new_tokens=20) | |
| print(tokenizer.decode(outputs[0])) | |
| ``` | |
| > **Learn more:** See the [Integrations guide](./integrations) for more details on using bitsandbytes with Transformers. | |
| ### 4-bit Quantization | |
| For even greater memory savings: | |
| ```py | |
| import torch | |
| from transformers import AutoModelForCausalLM, BitsAndBytesConfig | |
| bnb_config = BitsAndBytesConfig( | |
| load_in_4bit=True, | |
| bnb_4bit_compute_dtype=torch.bfloat16, | |
| bnb_4bit_quant_type="nf4", | |
| ) | |
| model = AutoModelForCausalLM.from_pretrained( | |
| "meta-llama/Llama-2-7b-hf", | |
| quantization_config=bnb_config, | |
| device_map="auto", | |
| ) | |
| ``` | |
| ### QLoRA Fine-tuning | |
| Combine 4-bit quantization with LoRA for efficient training: | |
| ```py | |
| from transformers import AutoModelForCausalLM, BitsAndBytesConfig | |
| from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training | |
| # Load 4-bit model | |
| bnb_config = BitsAndBytesConfig(load_in_4bit=True) | |
| model = AutoModelForCausalLM.from_pretrained( | |
| "meta-llama/Llama-2-7b-hf", | |
| quantization_config=bnb_config, | |
| ) | |
| # Prepare for training | |
| model = prepare_model_for_kbit_training(model) | |
| # Add LoRA adapters | |
| lora_config = LoraConfig( | |
| r=16, | |
| lora_alpha=32, | |
| target_modules=["q_proj", "v_proj"], | |
| task_type="CAUSAL_LM", | |
| ) | |
| model = get_peft_model(model, lora_config) | |
| # Now train with your preferred trainer | |
| ``` | |
| > **Learn more:** See the [FSDP-QLoRA guide](./fsdp_qlora) for advanced training techniques and the [Integrations guide](./integrations) for using with PEFT. | |
| ### 8-bit Optimizers | |
| Use 8-bit optimizers to reduce training memory by 75%: | |
| ```py | |
| import bitsandbytes as bnb | |
| model = YourModel() | |
| # Replace standard optimizer with 8-bit version | |
| optimizer = bnb.optim.Adam8bit(model.parameters(), lr=1e-3) | |
| # Use in training loop as normal | |
| for batch in dataloader: | |
| loss = model(batch) | |
| loss.backward() | |
| optimizer.step() | |
| optimizer.zero_grad() | |
| ``` | |
| > **Learn more:** See the [8-bit Optimizers guide](./optimizers) for detailed usage and configuration options. | |
| ### Custom Quantized Layers | |
| Use quantized linear layers directly in your models: | |
| ```py | |
| import torch | |
| import bitsandbytes as bnb | |
| # 8-bit linear layer | |
| linear_8bit = bnb.nn.Linear8bitLt(1024, 1024, has_fp16_weights=False) | |
| # 4-bit linear layer | |
| linear_4bit = bnb.nn.Linear4bit(1024, 1024, compute_dtype=torch.bfloat16) | |
| ``` | |
| ## Next Steps | |
| - [8-bit Optimizers Guide](./optimizers) - Detailed optimizer usage | |
| - [FSDP-QLoRA](./fsdp_qlora) - Train 70B+ models on consumer GPUs | |
| - [Integrations](./integrations) - Use with Transformers, PEFT, Accelerate | |
| - [FAQs](./faqs) - Common questions and troubleshooting | |
| ## Getting Help | |
| - Check the [FAQs](./faqs) and [Common Errors](./errors) | |
| - Visit [official documentation](https://huggingface.co/docs/bitsandbytes) | |
| - Open an issue on [GitHub](https://github.com/bitsandbytes-foundation/bitsandbytes/issues) | |
Xet Storage Details
- Size:
- 4.03 kB
- Xet hash:
- ff1936269d752557d3cbc7949667af9c5290fdf48e600746f667f4dfef5a8cd4
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.