SwastikM's picture
Update README.md
23bd217 verified
---
library_name: transformers
language:
- en
pipeline_tag: text-generation
---
# Meta-Llama-3-8B-Instruct-4bit
BitsAndBytes 4bit Quantized Model
# Quantization Configuration
- **load_in_4bit:** True
- **llm_int8_threshold:** 6.0
- **bnb_4bit_quant_type:** nf4
- **bnb_4bit_use_double_quant:** True
- **bnb_4bit_compute_dtype:** bfloat16
# How to use
### Load Required Libraries
```Python
!pip install transformers
!pip install peft
!pip install -U bitsandbytes
```
### Load model directly
```Python
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")
model = AutoModelForCausalLM.from_pretrained("SwastikM/Meta-Llama-3-8B-Instruct_bitsandbytes_4bit")
messages = [
{"role": "system", "content": "You are a Coder."},
{"role": "user", "content": "How to ctrate a list in Python?"}
]
input_ids = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
terminators = [
tokenizer.eos_token_id,
tokenizer.convert_tokens_to_ids("<|eot_id|>")
]
outputs = model.generate(
input_ids,
max_new_tokens=256,
eos_token_id=terminators,
do_sample=False,
temperature=0.0
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))
```
### Output
```
In Python, you can create a list in several ways:
1. Using the `list()` function:
my_list = list()
This creates an empty list.
2. Using square brackets `[]`:
my_list = []
This also creates an empty list.
3. Using the `list()` function with an iterable (such as a string or a tuple):
my_list = list("hello")
print(my_list) # Output: ['h', 'e', 'l', 'l', 'o']
4. Using the `list()` function with a range of numbers:
my_list = list(range(1, 6))
print(my_list) # Output: [1, 2, 3, 4, 5]
5. Using the `list()` function with a dictionary:
my_dict = {"a": 1, "b": 2, "c": 3}
my_list = list(my_dict.keys())
print(my_list) # Output: ['a', 'b', 'c']
Note that in Python, lists are mutable, meaning you can add, remove, or modify elements after creating the list.
```
## Size Comparison
The table shows comparison VRAM requirements for loading and training
of FP16 Base Model and 4bit GPTQ quantized model with PEFT.
The value for base model referenced from [Model Memory Calculator](https://huggingface.co/docs/accelerate/main/en/usage_guides/model_size_estimator)
from HuggingFace
| Model | Total Size |
|-------------------------|-------------|
| Base Model | 28 GB |
| 4bitQuantized | 5.21 GB |
## Acknowledgment
- Thanks to [@AMerve Noyan](https://huggingface.co/blog/merve/quantization) for precise intro.
- Thanks to [@HuggungFace Team](https://huggingface.co/blog/4bit-transformers-bitsandbytes) for the Blog.
- Thanks to [@Meta](https://huggingface.co/meta-llama) for the Open Source Model.
## Model Card Authors
Swastik Maiti