| ### llama 65B ggml model weight running alpaca.cpp | |
| ### make 65B ggml story | |
| #### 1. clone 65B model data | |
| ```shell | |
| git clone https://huggingface.co/datasets/nyanko7/LLaMA-65B/ | |
| ``` | |
| #### 2. clone alpaca.cpp | |
| ```shell | |
| git clone https://github.com/antimatter15/alpaca.cpp | |
| ``` | |
| #### 3. weight quantize.sh | |
| ```shell | |
| mv LLaMA-65B/tokenizer.model ./ | |
| python convert-pth-to-ggml.py ../LLaMA-65B/ 1 | |
| cd alpaca.cpp | |
| mkdir -p models/65B | |
| mv ../LLaMA-65B/ggml-model-f16.bin models/65B/ | |
| mv ../LLaMA-65B/ggml-model-f16.bin.* models/65B/ | |
| bash quantize.sh 65B | |
| ``` | |
| #### 4. upload weight file | |
| ##### Upload is slower. The upload is taking almost 2 days, I decided to curve the upload | |
| ##### I using https://tmp.link/ as temp store | |
| ##### I using colab and huggingface api upload | |
| ### run | |
| ```shell | |
| git clone https://github.com/antimatter15/ | |
| ./chat -m alpaca.cpp_65b_ggml/ggml-model-q4_0.bin | |
| ``` | |