Prose2Tags 4B
#2256
by redaihf - opened
weird, but
Phi4-Mini-Prose2Tags-4B ValueError: Error: Missing Phi4-Mini-Prose2Tags-4B/tokenizer.model
5 days ago
Llama.cpp hf to guff error?
The Phi3MiniModel class in the script is hardcoded to look for a tokenizer_class string of exactly 'GPT2Tokenizer'.
In Phi-4, that field in tokenizer_config.json might be named LlamaTokenizer, TiktokenTokenizer, or even just PreTrainedTokenizer, causing the script to skip the _set_vocab_gpt2() branch and fall straight into the ValueError.
Fix
In the script replace
class Phi3MiniModel(TextModel):
model_arch = gguf.MODEL_ARCH.PHI3
def set_vocab(self):
# Phi-4 model uses GPT2Tokenizer
tokenizer_config_file = self.dir_model / 'tokenizer_config.json'
if tokenizer_config_file.is_file():
with open(tokenizer_config_file, "r", encoding="utf-8") as f:
tokenizer_config_json = json.load(f)
tokenizer_class = tokenizer_config_json['tokenizer_class']
if tokenizer_class == 'GPT2Tokenizer':
return self._set_vocab_gpt2()
from sentencepiece import SentencePieceProcessor
with this:
def set_vocab(self):
# NEW LOGIC: If we have a tokenizer.json, use GPT2/HFFT logic immediately.
# This bypasses the need for the elusive tokenizer.model.
if (self.dir_model / 'tokenizer.json').is_file():
print("Detected tokenizer.json. Using GPT2/BPE vocabulary logic for Phi-4.")
return self._set_vocab_gpt2()
# Fallback for older Phi-3 models that actually use SentencePiece
tokenizer_path = self.dir_model / 'tokenizer.model'
if not tokenizer_path.is_file():
raise ValueError(f'Error: Missing {tokenizer_path}. If this is a Phi-4 model, ensure tokenizer.json is in the folder.')
from sentencepiece import SentencePieceProcessor
sorry, we are working only with main llama cpp and do not allow modifications/forks =(
static quants available here USS-Inferprise/Phi4-Mini-Prose2Tags-4B-GGUF