4-Bit quantizing of this model

#2
by Jdo300 - opened

Hello,

I recently purchased a license and downloaded a copy of this model to run with llama.cpp. In my case, I need to quantize the model so it will run fast enough on my GPU, but when I use convert.py to perform the conversation, I get this error:

FileNotFoundError: Could not find tokenizer.model in /home/.../models/7B/Mistral-7B-Instruct-v0.1-function-calling-v2 or its parent; if it's in another directory, pass the directory as --vocab-dir

I cloned the entire repo to the folder containing the model files and saw that there is a "tokenizer.json" file there. Is this compatible with the tokenizer.model file that the convert script is looking for? If not, what should I use to properly quantize this model?

howdy @Jdo300 .

I just added tokenizer.model to this repo (from the base mistral repo), that should fix things for you when quantizing.

I also made a gguf file and added it to main, to save you the trouble. lmk if that works well for you.

btw, detailed quantization guide here, if ever of help.

Thank you! I was able to load and run the quantized version of the model with llama.cpp and it works great!

RonanMcGovern changed discussion status to closed

Sign up or log in to comment