LM studio
i have tried to run this model in lm studio but i got an error
π₯² Failed to load the model
Failed to load model
error loading model: error loading model architecture: unknown model architecture: 'instella'
is there a way to fix it or do we have to wait when they will fix it
I'm actually trying to work on it, you need to add Instella to the llama.cpp binary. Adding it to the conversion script took a while, but AMD did actually provide us every thing we need to get all the weights correct. I'm not sure where to post my changes however.
If things go well, I will likely package my adjustments to the llama.cpp source code here in a tar package, if anyone wishes to push this to their github, they'll be welcome to, however I'm unsure if I'm allowed to do that. But given the conversion script is a perfect 1:1 translation to gguf using AMD's own weights, the loading has to equally be as 1:1, which is what I'm working on. As far as I can tell, I just need to figure out where to plug the values into the C code, which is what I'm currently hunting for. Once compiled, it should "just work".
Okay, I've uploaded what I changed for the conversion, if someone, perhaps yourself, can figure out how to implement the changes into the actual loader itself, you could simply run ollama serve, and just access it in LM Studio, open-webui, etc. I'll keep trying, but I'm having difficulties.
Thanks for Trying to convert the instella from safetensors to GGUF, because I Failed badly.
Okay, I've uploaded what I changed for the conversion, if someone, perhaps yourself, can figure out how to implement the changes into the actual loader itself, you could simply run ollama serve, and just access it in LM Studio, open-webui, etc. I'll keep trying, but I'm having difficulties.
You might could file a Pull Request (or Issue if you're less familiar with Pull Requests) for llama.cpp to get the code in. :) Eager for this, curious to try it out.
Okay, I've uploaded what I changed for the conversion, if someone, perhaps yourself, can figure out how to implement the changes into the actual loader itself, you could simply run ollama serve, and just access it in LM Studio, open-webui, etc. I'll keep trying, but I'm having difficulties.
You might could file a Pull Request (or Issue if you're less familiar with Pull Requests) for llama.cpp to get the code in. :) Eager for this, curious to try it out.
This was forever ago, IIRC, the problem was that a lot of different weights, constraints, etc, were needed for this to be added to ollama, I think I sent them a note about this, but I kind of moved on to other things, so, I have absolutely no idea anymore.
Okay, I've uploaded what I changed for the conversion, if someone, perhaps yourself, can figure out how to implement the changes into the actual loader itself, you could simply run ollama serve, and just access it in LM Studio, open-webui, etc. I'll keep trying, but I'm having difficulties.
You might could file a Pull Request (or Issue if you're less familiar with Pull Requests) for llama.cpp to get the code in. :) Eager for this, curious to try it out.
This was forever ago, IIRC, the problem was that a lot of different weights, constraints, etc, were needed for this to be added to ollama, I think I sent them a note about this, but I kind of moved on to other things, so, I have absolutely no idea anymore.
Thanks for the follow-up reply about it. I haven't seen any Issue(s) filed for the architecture to be added to Ollama. I might query them about it.
Edit: I found a close release you based on, b4856, I'll look at making a PR after I figure out how to use llama.cpp to begin with. :]
Edit: Does the following look alright? I took the most modern llama.cpp and added your changes (what I could find based on differential comparison). https://github.com/OdinVex/llama.cpp (3 files, convert_hf_to_gguf.py, gguf-py/gguf/constants.py, and gguf-py/gguf/tensor_mapping.pywere updated. It may need updating/tweaking, made some slight modernization changes (Model->ModelBase, etc).
I've got it somewhat loading in llama.cpp, but the tokenizer doesn't seem to be merged. Was that forgotten or?