---
library_name: transformers
language:
- en
pipeline_tag: text-generation
---

# Meta-Llama-3-8B-Instruct-4bit

BitsAndBytes 4bit Quantized Model

# Quantization Configuration

- **load_in_4bit:** True
- **llm_int8_threshold:** 6.0
- **bnb_4bit_quant_type:** nf4
- **bnb_4bit_use_double_quant:** True
- **bnb_4bit_compute_dtype:** bfloat16

# How to use

### Load Required Libraries

```Python
!pip install transformers 
!pip install peft
!pip install -U bitsandbytes
```

### Load model directly

```Python
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")
model = AutoModelForCausalLM.from_pretrained("SwastikM/Meta-Llama-3-8B-Instruct_bitsandbytes_4bit")

messages = [
    {"role": "system", "content": "You are a Coder."},
    {"role": "user", "content": "How to ctrate a list in Python?"}
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = model.generate(
    input_ids,
    max_new_tokens=256,
    eos_token_id=terminators,
    do_sample=False,
    temperature=0.0
)

response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))
```

### Output

```
In Python, you can create a list in several ways:

1. Using the `list()` function:

my_list = list()

This creates an empty list.

2. Using square brackets `[]`:

my_list = []

This also creates an empty list.

3. Using the `list()` function with an iterable (such as a string or a tuple):

my_list = list("hello")
print(my_list)  # Output: ['h', 'e', 'l', 'l', 'o']

4. Using the `list()` function with a range of numbers:

my_list = list(range(1, 6))
print(my_list)  # Output: [1, 2, 3, 4, 5]

5. Using the `list()` function with a dictionary:

my_dict = {"a": 1, "b": 2, "c": 3}
my_list = list(my_dict.keys())
print(my_list)  # Output: ['a', 'b', 'c']

Note that in Python, lists are mutable, meaning you can add, remove, or modify elements after creating the list.
```

## Size Comparison

The table shows comparison VRAM requirements for loading and training
of FP16 Base Model and 4bit GPTQ quantized model with PEFT.
The value for base model referenced from [Model Memory Calculator](https://huggingface.co/docs/accelerate/main/en/usage_guides/model_size_estimator)
from HuggingFace


| Model                   | Total Size  |
|-------------------------|-------------| 
| Base Model              | 28 GB       | 
| 4bitQuantized           | 5.21 GB     |


## Acknowledgment

- Thanks to [@AMerve Noyan](https://huggingface.co/blog/merve/quantization) for precise intro.
- Thanks to [@HuggungFace Team](https://huggingface.co/blog/4bit-transformers-bitsandbytes) for the Blog.
- Thanks to [@Meta](https://huggingface.co/meta-llama) for the Open Source Model.


## Model Card Authors

Swastik Maiti