Instructions to use maderix/llama-65b-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use maderix/llama-65b-4bit with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("maderix/llama-65b-4bit", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -14,4 +14,5 @@ Installation instructions as mentioned in above repo:
|
|
| 14 |
5. Run python cuda_setup.py install in venv
|
| 15 |
6. You can either convert the llama models yourself with the instructions from GPTQ-for-llama repo
|
| 16 |
7. or directly use these weights by individually downloading them following these instructions (https://huggingface.co/docs/huggingface_hub/guides/download)
|
| 17 |
-
8. Profit!
|
|
|
|
|
|
| 14 |
5. Run python cuda_setup.py install in venv
|
| 15 |
6. You can either convert the llama models yourself with the instructions from GPTQ-for-llama repo
|
| 16 |
7. or directly use these weights by individually downloading them following these instructions (https://huggingface.co/docs/huggingface_hub/guides/download)
|
| 17 |
+
8. Profit!
|
| 18 |
+
9. Best results are obtained by putting a repetition_penalty(~1/0.85),temperature=0.7 in model.generate() for most LLaMA models
|