|
|
--- |
|
|
library_name: transformers |
|
|
language: |
|
|
- en |
|
|
pipeline_tag: text-generation |
|
|
--- |
|
|
|
|
|
# Meta-Llama-3-8B-Instruct-4bit |
|
|
|
|
|
BitsAndBytes 4bit Quantized Model |
|
|
|
|
|
# Quantization Configuration |
|
|
|
|
|
- **load_in_4bit:** True |
|
|
- **llm_int8_threshold:** 6.0 |
|
|
- **bnb_4bit_quant_type:** nf4 |
|
|
- **bnb_4bit_use_double_quant:** True |
|
|
- **bnb_4bit_compute_dtype:** bfloat16 |
|
|
|
|
|
# How to use |
|
|
|
|
|
### Load Required Libraries |
|
|
|
|
|
```Python |
|
|
!pip install transformers |
|
|
!pip install peft |
|
|
!pip install -U bitsandbytes |
|
|
``` |
|
|
|
|
|
### Load model directly |
|
|
|
|
|
```Python |
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct") |
|
|
model = AutoModelForCausalLM.from_pretrained("SwastikM/Meta-Llama-3-8B-Instruct_bitsandbytes_4bit") |
|
|
|
|
|
messages = [ |
|
|
{"role": "system", "content": "You are a Coder."}, |
|
|
{"role": "user", "content": "How to ctrate a list in Python?"} |
|
|
] |
|
|
|
|
|
input_ids = tokenizer.apply_chat_template( |
|
|
messages, |
|
|
add_generation_prompt=True, |
|
|
return_tensors="pt" |
|
|
).to(model.device) |
|
|
|
|
|
terminators = [ |
|
|
tokenizer.eos_token_id, |
|
|
tokenizer.convert_tokens_to_ids("<|eot_id|>") |
|
|
] |
|
|
|
|
|
outputs = model.generate( |
|
|
input_ids, |
|
|
max_new_tokens=256, |
|
|
eos_token_id=terminators, |
|
|
do_sample=False, |
|
|
temperature=0.0 |
|
|
) |
|
|
|
|
|
response = outputs[0][input_ids.shape[-1]:] |
|
|
print(tokenizer.decode(response, skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
### Output |
|
|
|
|
|
``` |
|
|
In Python, you can create a list in several ways: |
|
|
|
|
|
1. Using the `list()` function: |
|
|
|
|
|
my_list = list() |
|
|
|
|
|
This creates an empty list. |
|
|
|
|
|
2. Using square brackets `[]`: |
|
|
|
|
|
my_list = [] |
|
|
|
|
|
This also creates an empty list. |
|
|
|
|
|
3. Using the `list()` function with an iterable (such as a string or a tuple): |
|
|
|
|
|
my_list = list("hello") |
|
|
print(my_list) # Output: ['h', 'e', 'l', 'l', 'o'] |
|
|
|
|
|
4. Using the `list()` function with a range of numbers: |
|
|
|
|
|
my_list = list(range(1, 6)) |
|
|
print(my_list) # Output: [1, 2, 3, 4, 5] |
|
|
|
|
|
5. Using the `list()` function with a dictionary: |
|
|
|
|
|
my_dict = {"a": 1, "b": 2, "c": 3} |
|
|
my_list = list(my_dict.keys()) |
|
|
print(my_list) # Output: ['a', 'b', 'c'] |
|
|
|
|
|
Note that in Python, lists are mutable, meaning you can add, remove, or modify elements after creating the list. |
|
|
``` |
|
|
|
|
|
## Size Comparison |
|
|
|
|
|
The table shows comparison VRAM requirements for loading and training |
|
|
of FP16 Base Model and 4bit GPTQ quantized model with PEFT. |
|
|
The value for base model referenced from [Model Memory Calculator](https://huggingface.co/docs/accelerate/main/en/usage_guides/model_size_estimator) |
|
|
from HuggingFace |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| Model | Total Size | |
|
|
|-------------------------|-------------| |
|
|
| Base Model | 28 GB | |
|
|
| 4bitQuantized | 5.21 GB | |
|
|
|
|
|
|
|
|
## Acknowledgment |
|
|
|
|
|
- Thanks to [@AMerve Noyan](https://huggingface.co/blog/merve/quantization) for precise intro. |
|
|
- Thanks to [@HuggungFace Team](https://huggingface.co/blog/4bit-transformers-bitsandbytes) for the Blog. |
|
|
- Thanks to [@Meta](https://huggingface.co/meta-llama) for the Open Source Model. |
|
|
|
|
|
|
|
|
## Model Card Authors |
|
|
|
|
|
Swastik Maiti |
|
|
|