--- library_name: transformers language: - en pipeline_tag: text-generation --- # Meta-Llama-3-8B-Instruct-4bit BitsAndBytes 4bit Quantized Model # Quantization Configuration - **load_in_4bit:** True - **llm_int8_threshold:** 6.0 - **bnb_4bit_quant_type:** nf4 - **bnb_4bit_use_double_quant:** True - **bnb_4bit_compute_dtype:** bfloat16 # How to use ### Load Required Libraries ```Python !pip install transformers !pip install peft !pip install -U bitsandbytes ``` ### Load model directly ```Python from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct") model = AutoModelForCausalLM.from_pretrained("SwastikM/Meta-Llama-3-8B-Instruct_bitsandbytes_4bit") messages = [ {"role": "system", "content": "You are a Coder."}, {"role": "user", "content": "How to ctrate a list in Python?"} ] input_ids = tokenizer.apply_chat_template( messages, add_generation_prompt=True, return_tensors="pt" ).to(model.device) terminators = [ tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<|eot_id|>") ] outputs = model.generate( input_ids, max_new_tokens=256, eos_token_id=terminators, do_sample=False, temperature=0.0 ) response = outputs[0][input_ids.shape[-1]:] print(tokenizer.decode(response, skip_special_tokens=True)) ``` ### Output ``` In Python, you can create a list in several ways: 1. Using the `list()` function: my_list = list() This creates an empty list. 2. Using square brackets `[]`: my_list = [] This also creates an empty list. 3. Using the `list()` function with an iterable (such as a string or a tuple): my_list = list("hello") print(my_list) # Output: ['h', 'e', 'l', 'l', 'o'] 4. Using the `list()` function with a range of numbers: my_list = list(range(1, 6)) print(my_list) # Output: [1, 2, 3, 4, 5] 5. Using the `list()` function with a dictionary: my_dict = {"a": 1, "b": 2, "c": 3} my_list = list(my_dict.keys()) print(my_list) # Output: ['a', 'b', 'c'] Note that in Python, lists are mutable, meaning you can add, remove, or modify elements after creating the list. ``` ## Size Comparison The table shows comparison VRAM requirements for loading and training of FP16 Base Model and 4bit GPTQ quantized model with PEFT. The value for base model referenced from [Model Memory Calculator](https://huggingface.co/docs/accelerate/main/en/usage_guides/model_size_estimator) from HuggingFace | Model | Total Size | |-------------------------|-------------| | Base Model | 28 GB | | 4bitQuantized | 5.21 GB | ## Acknowledgment - Thanks to [@AMerve Noyan](https://huggingface.co/blog/merve/quantization) for precise intro. - Thanks to [@HuggungFace Team](https://huggingface.co/blog/4bit-transformers-bitsandbytes) for the Blog. - Thanks to [@Meta](https://huggingface.co/meta-llama) for the Open Source Model. ## Model Card Authors Swastik Maiti