KeyError: 'mixtral' even transformers is 4.36.2
Hi just some beginners here , I've run a model and got some errors even though I read some post already and had to upgrade transformers and pip but it still didn't work.
// check transformers version //
C:**>pip show transformers
Name: transformers
Version: 4.36.2
Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
Home-page: https://github.com/huggingface/transformers
Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors)
Author-email: transformers@huggingface.co
License: Apache 2.0 License
Location: C:******\AppData\Roaming\Python\Python311\site-packages
Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, safetensors, tokenizers, tqdm
Required-by: sentence-transformers
// config file //
config.json
{
"_name_or_path": "/workspace/models/Mixtral-8x7B-v0.1",
"architectures": [
"MixtralForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 32000,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 32768,
"model_type": "mixtral",
"num_attention_heads": 32,
"num_experts_per_tok": 2,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"num_local_experts": 8,
"output_router_logits": false,
"rms_norm_eps": 1e-05,
"rope_theta": 1000000.0,
"router_aux_loss_coef": 0.02,
"sliding_window": null,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.36.0.dev0",
"use_cache": false,
"vocab_size": 32002
}
I would appreciate your help if there s' another way to fix it. I'm using this model with text generation web UI but as I click to load the modal it shows me this error = KeyError: 'mixtral'
hey there, I got the same error, I can load from AutoTokenizer the Mixtral-8x7B-v0.1 version after updating to transfomers_version 4.36.2 but not the model from AutoModelForCausalLM due to key error
hi @ReneRockerz you might find Impulse AI (https://www.impulselabs.ai/) useful. we make it super easy to fine-tune and deploy open source models. hopefully you find it helpful! i know not relevant to your problem above but might be easier to use us to fine tune and deploy
docs: https://docs.impulselabs.ai/introduction
python sdk: https://pypi.org/project/impulse-api-sdk-python/