Nous-Hermes-13B ggml

From: https://huggingface.co/NousResearch/Nous-Hermes-13b

Original llama.cpp quant methods: `q4_0, q4_1, q5_0, q5_1, q8_0`

Quantized using an older version of llama.cpp and compatible with llama.cpp from May 19, commit 2d5db48.

k-quant methods: `q2_K, q3_K_S, q3_K_M, q3_K_L, q4_K_S, q4_K_M, q5_K_S, q6_K`

Quantization methods compatible with latest llama.cpp from June 6, commit 2d43387.

Provided Files

Name	Quant method	Bits	Size	Max RAM required, no GPU offloading	Use case
nous-hermes-13b.ggmlv3.q2_K.bin	q2_K	2	5.43 GB	7.93 GB	New k-quant method. Uses GGML_TYPE_Q4_K for the attention.vw and feed_forward.w2 tensors, GGML_TYPE_Q2_K for the other tensors.
nous-hermes-13b.ggmlv3.q3_K_L.bin	q3_K_L	3	6.87 GB	9.37 GB	New k-quant method. Uses GGML_TYPE_Q5_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K
nous-hermes-13b.ggmlv3.q3_K_M.bin	q3_K_M	3	6.25 GB	8.75 GB	New k-quant method. Uses GGML_TYPE_Q4_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K
nous-hermes-13b.ggmlv3.q3_K_S.bin	q3_K_S	3	5.59 GB	8.09 GB	New k-quant method. Uses GGML_TYPE_Q3_K for all tensors
nous-hermes-13b.ggmlv3.q4_0.bin	q4_0	4	7.32 GB	9.82 GB	Original llama.cpp quant method, 4-bit.
nous-hermes-13b.ggmlv3.q4_1.bin	q4_1	4	8.14 GB	10.64 GB	Original llama.cpp quant method, 4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models.
nous-hermes-13b.ggmlv3.q4_K_M.bin	q4_K_M	4	7.82 GB	10.32 GB	New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q4_K
nous-hermes-13b.ggmlv3.q4_K_S.bin	q4_K_S	4	7.32 GB	9.82 GB	New k-quant method. Uses GGML_TYPE_Q4_K for all tensors
nous-hermes-13b.ggmlv3.q5_0.bin	q5_0	5	8.95 GB	11.45 GB	Original llama.cpp quant method, 5-bit. Higher accuracy, higher resource usage and slower inference.
nous-hermes-13b.ggmlv3.q5_1.bin	q5_1	5	9.76 GB	12.26 GB	Original llama.cpp quant method, 5-bit. Even higher accuracy, resource usage and slower inference.
nous-hermes-13b.ggmlv3.q5_K_M.bin	q5_K_M	5	9.21 GB	11.71 GB	New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q5_K
nous-hermes-13b.ggmlv3.q5_K_S.bin	q5_K_S	5	8.95 GB	11.45 GB	New k-quant method. Uses GGML_TYPE_Q5_K for all tensors
nous-hermes-13b.ggmlv3.q6_K.bin	q6_K	6	10.68 GB	13.18 GB	New k-quant method. Uses GGML_TYPE_Q8_K - 6-bit quantization - for all tensors
nous-hermes-13b.ggmlv3.q8_0.bin	q8_0	8	13.83 GB	16.33 GB	Original llama.cpp quant method, 8-bit. Almost indistinguishable from float16. High resource use and slow. Not recommended for most users.

Model Card: Nous-Hermes-13b

Model Description

Nous-Hermes-13b is a state-of-the-art language model fine-tuned on over 300,000 instructions. This model was fine-tuned by Nous Research, with Teknium and Karan4D leading the fine tuning process and dataset curation, Redmond AI sponsoring the compute, and several other contributors. The result is an enhanced Llama 13b model that rivals GPT-3.5-turbo in performance across a variety of tasks.

This model stands out for its long responses, low hallucination rate, and absence of OpenAI censorship mechanisms. The fine-tuning process was performed with a 2000 sequence length on an 8x a100 80GB DGX machine for over 50 hours.

Model Training

The model was trained almost entirely on synthetic GPT-4 outputs. This includes data from diverse sources such as GPTeacher, the general, roleplay v1&2, code instruct datasets, Nous Instruct & PDACTL (unpublished), CodeAlpaca, Evol_Instruct Uncensored, GPT4-LLM, and Unnatural Instructions.

Additional data inputs came from Camel-AI's Biology/Physics/Chemistry and Math Datasets, Airoboros' GPT-4 Dataset, and more from CodeAlpaca. The total volume of data encompassed over 300,000 instructions.

Collaborators

The model fine-tuning and the datasets were a collaboration of efforts and resources between Teknium, Karan4D, Nous Research, Huemin Art, and Redmond AI.

Huge shoutout and acknowledgement is deserved for all the dataset creators who generously share their datasets openly.

Special mention goes to @winglian, @erhartford, and @main_horse for assisting in some of the training issues.

Among the contributors of datasets, GPTeacher was made available by Teknium, Wizard LM by nlpxucan, and the Nous Research Instruct Dataset was provided by Karan4D and HueminArt.
The GPT4-LLM and Unnatural Instructions were provided by Microsoft, Airoboros dataset by jondurbin, Camel-AI datasets are from Camel-AI, and CodeAlpaca dataset by Sahil 2801. If anyone was left out, please open a thread in the community tab.

Prompt Format

The model follows the Alpaca prompt format:

### Instruction:

### Response:

### Instruction:

### Input:

### Response:

Benchmark Results

|    Task     |Version| Metric |Value |   |Stderr|
|-------------|------:|--------|-----:|---|-----:|
|arc_challenge|      0|acc     |0.4915|±  |0.0146|
|             |       |acc_norm|0.5085|±  |0.0146|
|arc_easy     |      0|acc     |0.7769|±  |0.0085|
|             |       |acc_norm|0.7424|±  |0.0090|
|boolq        |      1|acc     |0.7948|±  |0.0071|
|hellaswag    |      0|acc     |0.6143|±  |0.0049|
|             |       |acc_norm|0.8000|±  |0.0040|
|openbookqa   |      0|acc     |0.3560|±  |0.0214|
|             |       |acc_norm|0.4640|±  |0.0223|
|piqa         |      0|acc     |0.7965|±  |0.0094|
|             |       |acc_norm|0.7889|±  |0.0095|
|winogrande   |      0|acc     |0.7190|±  |0.0126|

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support