File size: 2,992 Bytes
2e32f22 e5c5447 2e32f22 fddfb41 042e8c3 fddfb41 042e8c3 fddfb41 2e32f22 042e8c3 786d398 042e8c3 23bd217 042e8c3 ef5a5e7 042e8c3 ef5a5e7 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 |
---
library_name: transformers
language:
- en
pipeline_tag: text-generation
---
# Meta-Llama-3-8B-Instruct-4bit
BitsAndBytes 4bit Quantized Model
# Quantization Configuration
- **load_in_4bit:** True
- **llm_int8_threshold:** 6.0
- **bnb_4bit_quant_type:** nf4
- **bnb_4bit_use_double_quant:** True
- **bnb_4bit_compute_dtype:** bfloat16
# How to use
### Load Required Libraries
```Python
!pip install transformers
!pip install peft
!pip install -U bitsandbytes
```
### Load model directly
```Python
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")
model = AutoModelForCausalLM.from_pretrained("SwastikM/Meta-Llama-3-8B-Instruct_bitsandbytes_4bit")
messages = [
{"role": "system", "content": "You are a Coder."},
{"role": "user", "content": "How to ctrate a list in Python?"}
]
input_ids = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
terminators = [
tokenizer.eos_token_id,
tokenizer.convert_tokens_to_ids("<|eot_id|>")
]
outputs = model.generate(
input_ids,
max_new_tokens=256,
eos_token_id=terminators,
do_sample=False,
temperature=0.0
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))
```
### Output
```
In Python, you can create a list in several ways:
1. Using the `list()` function:
my_list = list()
This creates an empty list.
2. Using square brackets `[]`:
my_list = []
This also creates an empty list.
3. Using the `list()` function with an iterable (such as a string or a tuple):
my_list = list("hello")
print(my_list) # Output: ['h', 'e', 'l', 'l', 'o']
4. Using the `list()` function with a range of numbers:
my_list = list(range(1, 6))
print(my_list) # Output: [1, 2, 3, 4, 5]
5. Using the `list()` function with a dictionary:
my_dict = {"a": 1, "b": 2, "c": 3}
my_list = list(my_dict.keys())
print(my_list) # Output: ['a', 'b', 'c']
Note that in Python, lists are mutable, meaning you can add, remove, or modify elements after creating the list.
```
## Size Comparison
The table shows comparison VRAM requirements for loading and training
of FP16 Base Model and 4bit GPTQ quantized model with PEFT.
The value for base model referenced from [Model Memory Calculator](https://huggingface.co/docs/accelerate/main/en/usage_guides/model_size_estimator)
from HuggingFace
| Model | Total Size |
|-------------------------|-------------|
| Base Model | 28 GB |
| 4bitQuantized | 5.21 GB |
## Acknowledgment
- Thanks to [@AMerve Noyan](https://huggingface.co/blog/merve/quantization) for precise intro.
- Thanks to [@HuggungFace Team](https://huggingface.co/blog/4bit-transformers-bitsandbytes) for the Blog.
- Thanks to [@Meta](https://huggingface.co/meta-llama) for the Open Source Model.
## Model Card Authors
Swastik Maiti
|