Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,39 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# 4-bit Quantized Llama 3 Model
|
| 2 |
+
|
| 3 |
+
## Description
|
| 4 |
+
This repository hosts the 4-bit quantized version of the Llama 3 model. Optimized for reduced memory usage and faster inference, this model is suitable for deployment in environments where computational resources are limited.
|
| 5 |
+
|
| 6 |
+
## Model Details
|
| 7 |
+
- **Model Type**: Transformer-based language model.
|
| 8 |
+
- **Quantization**: 4-bit precision.
|
| 9 |
+
- **Advantages**:
|
| 10 |
+
- **Memory Efficiency**: Reduces memory usage significantly, allowing deployment on devices with limited RAM.
|
| 11 |
+
- **Inference Speed**: Accelerates inference times, depending on the hardware's ability to process low-bit computations.
|
| 12 |
+
|
| 13 |
+
## How to Use
|
| 14 |
+
To utilize this model efficiently, follow the steps below:
|
| 15 |
+
|
| 16 |
+
### Loading the Quantized Model
|
| 17 |
+
Load the model with specific parameters to ensure it utilizes 4-bit precision:
|
| 18 |
+
```python
|
| 19 |
+
from transformers import AutoModelForCausalLM
|
| 20 |
+
|
| 21 |
+
model_4bit = AutoModelForCausalLM.from_pretrained("SweatyCrayfish/llama-3-8b-quantized", device_map="auto", load_in_4bit=True)
|
| 22 |
+
```
|
| 23 |
+
## Adjusting Precision of Components
|
| 24 |
+
Adjust the precision of other components, which are by default converted to torch.float16:
|
| 25 |
+
```python
|
| 26 |
+
import torch
|
| 27 |
+
from transformers import AutoModelForCausalLM
|
| 28 |
+
|
| 29 |
+
model_4bit = AutoModelForCausalLM.from_pretrained("SweatyCrayfish/llama-3-8b-quantized", load_in_4bit=True, torch_dtype=torch.float32)
|
| 30 |
+
print(model_4bit.model.decoder.layers[-1].final_layer_norm.weight.dtype)
|
| 31 |
+
```
|
| 32 |
+
## Citation
|
| 33 |
+
Original repository and citations:
|
| 34 |
+
@article{llama3modelcard,
|
| 35 |
+
title={Llama 3 Model Card},
|
| 36 |
+
author={AI@Meta},
|
| 37 |
+
year={2024},
|
| 38 |
+
url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md}
|
| 39 |
+
}
|