Update README.md
Browse files
README.md
CHANGED
|
@@ -15,7 +15,7 @@ inference: false
|
|
| 15 |
# Koala: A Dialogue Model for Academic Research
|
| 16 |
This repo contains the weights of the Koala 13B model produced at Berkeley. It is the result of combining the diffs from https://huggingface.co/young-geng/koala with the original Llama 13B model.
|
| 17 |
|
| 18 |
-
This version has then been quantized to 4-bit
|
| 19 |
|
| 20 |
## My Koala repos
|
| 21 |
I have the following Koala model repositories available:
|
|
@@ -23,13 +23,21 @@ I have the following Koala model repositories available:
|
|
| 23 |
**13B models:**
|
| 24 |
* [Unquantized 13B model in HF format](https://huggingface.co/TheBloke/koala-13B-HF)
|
| 25 |
* [GPTQ quantized 4bit 13B model in `pt` and `safetensors` formats](https://huggingface.co/TheBloke/koala-13B-GPTQ-4bit-128g)
|
| 26 |
-
* [
|
| 27 |
|
| 28 |
**7B models:**
|
| 29 |
* [Unquantized 7B model in HF format](https://huggingface.co/TheBloke/koala-7B-HF)
|
| 30 |
* [Unquantized 7B model in GGML format for llama.cpp](https://huggingface.co/TheBloke/koala-7b-ggml-unquantized)
|
| 31 |
* [GPTQ quantized 4bit 7B model in `pt` and `safetensors` formats](https://huggingface.co/TheBloke/koala-7B-GPTQ-4bit-128g)
|
| 32 |
-
* [
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
|
| 34 |
## How to run in `llama.cpp`
|
| 35 |
|
|
|
|
| 15 |
# Koala: A Dialogue Model for Academic Research
|
| 16 |
This repo contains the weights of the Koala 13B model produced at Berkeley. It is the result of combining the diffs from https://huggingface.co/young-geng/koala with the original Llama 13B model.
|
| 17 |
|
| 18 |
+
This version has then been quantized to 4-bit and 5-bit GGML for use with [llama.cpp](https://github.com/ggerganov/llama.cpp).
|
| 19 |
|
| 20 |
## My Koala repos
|
| 21 |
I have the following Koala model repositories available:
|
|
|
|
| 23 |
**13B models:**
|
| 24 |
* [Unquantized 13B model in HF format](https://huggingface.co/TheBloke/koala-13B-HF)
|
| 25 |
* [GPTQ quantized 4bit 13B model in `pt` and `safetensors` formats](https://huggingface.co/TheBloke/koala-13B-GPTQ-4bit-128g)
|
| 26 |
+
* [4bit and 5bit models in GGML format for `llama.cpp`](https://huggingface.co/TheBloke/koala-13B-GGML)
|
| 27 |
|
| 28 |
**7B models:**
|
| 29 |
* [Unquantized 7B model in HF format](https://huggingface.co/TheBloke/koala-7B-HF)
|
| 30 |
* [Unquantized 7B model in GGML format for llama.cpp](https://huggingface.co/TheBloke/koala-7b-ggml-unquantized)
|
| 31 |
* [GPTQ quantized 4bit 7B model in `pt` and `safetensors` formats](https://huggingface.co/TheBloke/koala-7B-GPTQ-4bit-128g)
|
| 32 |
+
* [4bit and 5bit models in GGML format for `llama.cpp`](https://huggingface.co/TheBloke/koala-7B-GGML)
|
| 33 |
+
|
| 34 |
+
## REQUIRES LATEST LLAMA.CPP (May 12th 2023 - commit b9fd7ee)!
|
| 35 |
+
|
| 36 |
+
llama.cpp recently made a breaking change to its quantisation methods.
|
| 37 |
+
|
| 38 |
+
I have re-quantised the GGML files in this repo. Therefore you will require llama.cpp compiled on May 12th or later (commit `b9fd7ee` or later) to use them.
|
| 39 |
+
|
| 40 |
+
The previous files, which will still work in older versions of llama.cpp, can be found in branch `previous_llama`.
|
| 41 |
|
| 42 |
## How to run in `llama.cpp`
|
| 43 |
|