4-Bit quantizing of this model

by Jdo300 - opened Oct 27, 2023

Oct 27, 2023

Hello,

I recently purchased a license and downloaded a copy of this model to run with llama.cpp. In my case, I need to quantize the model so it will run fast enough on my GPU, but when I use convert.py to perform the conversation, I get this error:

FileNotFoundError: Could not find tokenizer.model in /home/.../models/7B/Mistral-7B-Instruct-v0.1-function-calling-v2 or its parent; if it's in another directory, pass the directory as --vocab-dir

I cloned the entire repo to the folder containing the model files and saw that there is a "tokenizer.json" file there. Is this compatible with the tokenizer.model file that the convert script is looking for? If not, what should I use to properly quantize this model?

RonanMcGovern

Trelis org Oct 27, 2023

howdy @Jdo300 .

I just added tokenizer.model to this repo (from the base mistral repo), that should fix things for you when quantizing.

I also made a gguf file and added it to main, to save you the trouble. lmk if that works well for you.

btw, detailed quantization guide here, if ever of help.

Jdo300

Oct 28, 2023

Thank you! I was able to load and run the quantized version of the model with llama.cpp and it works great!

RonanMcGovern changed discussion status to closed Nov 14, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment