any successes with using this model?

#1
by case-ai - opened

This model doesn't seem to output anything other than generic responses, regardless of input. I have fairly context-heavy roleplay prompts, ~4k-6k on average, and they're run through an API that's been setup to work with more or less every model syntax out there. Am I missing something? I'm using the mradermacher/MS3.2-24B-Penumbra-Aether-i1-GGUF quants.

Example outputs:

{"finish_reason":"stop","index":0,"message":{"role":"assistant","content":"-7\n\n## Mistral v7 for Tekken 7\n\nThis model is a fine-tuned version of mistralai/Mistral-7B-v0.1. It was trained on various conversation datasets, including Tekken 7 and gaming, to provide more accurate and helpful responses in those domains.\n\nView API docs View model on Mistral AI Console\n\nThis model is a fine-tuned version of mistralai/Mistral-7B-v0.1. It was trained on various conversation datasets, including Tekken 7 and gaming, to provide more accurate and helpful responses in those domains. It is designed to help with conversations related to the game, including character strategies, move sets, and general gameplay tips.\n\n## Use Cases\n\n- Character Strategies: Get insights into the best strategies for playing specific Tekken 7 characters.\n- Move Sets: Learn about the move sets and combos for different characters.\n- Gameplay Tips: Receive tips and advice to improve your gameplay.\n- Community Discussions: Engage in discussions about the game, including updates, patches, and community events.\n\n## Prompt Examples\n\n- \"What are the best strategies for playing as Kazumi Mishima in Tekken 7?\"\n- \"Can you explain the move set for Paul Phoenix?\"\n- \"What are some tips for improving my defense in Tekken 7?\"\n- \"What are the latest updates and patches for Tekken 7?\""}}
{"finish_reason":"stop","index":0,"message":{"role":"assistant","content":"[English (en)]\n# Mistral AI\n\nMistral AI is a cutting-edge AI lab based in France, focused on developing advanced large language models and AI tools. Our team combines world-class researchers, engineers, and entrepreneurs from leading organizations such as Meta Platforms, Google DeepMind, and Google Brain, as well as top academic institutions like NYU, Cambridge, and Oxford.\n\n## Core Values\n\n- **Open & Honest**: We believe in transparency and honesty in all our actions.\n- **Collaborative & Kind**: We foster a culture of collaboration and kindness, where everyone feels valued and respected.\n- **Ambitious & Pragmatic**: We set high goals but remain grounded in practicality.\n- **Curious & Creative**: We encourage curiosity and creativity to drive innovation.\n\n## Projects\n\n- **Mistral Large**: A large-scale language model designed for high performance and efficiency.\n- **Le Chat**: An open-source conversational AI model.\n- **Mixtral**: An open-weight 8x7B sparse mixture of experts model.\n- **Codium**: An open-source AI model for code generation.\n- **Distil**: An open-source 40B parameter language model.\n\n## Research\n\nOur research spans a wide range of topics, including:\n\n- **Natural Language Processing**: Developing models that understand and generate human language.\n- **Machine Learning**: Advancing techniques for training and deploying AI models.\n- **AI Safety**: Ensuring that AI systems are safe, reliable, and trustworthy.\n\n## Community\n\nWe are committed to building a diverse and inclusive community of researchers, developers, and AI enthusiasts. We regularly host events, workshops, and hackathons to foster collaboration and learning.\n\n## Careers\n\nWe are always looking for talented individuals to join our team. If you are passionate about AI and want to make a difference, we encourage you to explore our career opportunities.\n\nFor more information, visit our website at [mistral.ai](https://mistral.ai)."}}

Which API?

Which API?

It's not a standardized web API, it's one I rolled as an interface for an experimental game - in this use case the API is talking to an HF endpoint, but it does work consistently with dozens of other models and several hosts. The endpoint runs the model on llama.cpp (standard settings across the board).

Works fine in sillytavern, I run llama.cpp to and model doesn't work in webui, I am guessing it is missing a baked in chat template?

That makes sense to me; I'm still just a consumer of models and haven't gotten hands-on with finetunes, etc, so I don't know all the steps of what's involved for them to work across platforms/engines. SillyTavern seems to handle a lot under the hood and might supply a default ChatML template. I'm not sure if there's a way to slot a template into the endpoint config, but I'll dig into it tonight, thanks!

No luck with using any of the expected llama.cpp --chat-template <template> options based on the token config, that's pretty much all I could find as a potential solution without direct access to the environment (manual template file loading).

No luck with using any of the expected llama.cpp --chat-template <template> options based on the token config, that's pretty much all I could find as a potential solution without direct access to the environment (manual template file loading).

I have tried this as well in the past. The 12B models seem to work fine on the llama webui, but for 24B models, based on my testing, those without a baked-in chat template did not work, while those with one did. I could be wrong for the next merge, I will set a chat template as that seems to be the issue couldn't hurt anyways. I am glad you commented as I think I have figured it out, it is hard to find good info sometimes for ai.

Awesome, I will keep an eye open!

Vortex5 changed discussion status to closed

Sign up or log in to comment