llm.create_chat_completion(
messages = "No input example has been defined for this model task."
)This repository aims to explore the extreme compression ratio of the model, so only low bit quantization models are provided. They all quantized from F16.
| model | size | ppl |
|---|---|---|
| F16 | 15G | 8.3662 +/- 0.06216 |
| IQ2_M | 2.8G | 10.2360 +/- 0.07470 |
| IQ2_S | 2.6G | 11.3735 +/- 0.08396 |
| IQ2_XS | 2.5G | 12.3081 +/- 0.08961 |
| IQ2_XXS | 2.3G | 15.9081 +/- 0.11701 |
| IQ1_M | 2.1G | 26.5610 +/- 0.19391 |
- Downloads last month
- 19
Hardware compatibility
Log In to add your hardware
1-bit
2-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="DavidZyy/Meta-Llama-3-8B-Instruct", filename="", )