Pythia Deduped Series GGML

For use with frontends that support GGML quantized GPT-NeoX models, such as KoboldCpp and Oobabooga (with the CTransformers loader).

Last updated on 2023-05-25.

For other versions of the models, see here:

GGMLv1 q4_3 (70M to 12B)
GGMLv1 q5_0 / q5_1 / q8_0 (70M to 2.8B)
GGMLv1 q4_0 / q4_2 (70M to 2.8B)
GGMLv2 q4_0 / q5_1 (70M to 2.8B)
GGMLv3 q4_0 / q5_1 (70M to 2.8B)

Description:

The motivation behind these quantizations was that the LLaMA series lacks sizes below 7B, whereas it was the norm for older models to be available in as little as ~125M parameters. This makes it uncomfortable to run on hardware with less than 4GB of RAM, even with 2-bit quantization.

RAM USAGE

Tested on KoboldCpp with OpenBLAS enabled.

Downloads last month: -; Downloads are not tracked for this model. How to track