Instructions to use Naphula/Magistaroth-24B-v1-MPOA with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Naphula/Magistaroth-24B-v1-MPOA with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Naphula/Magistaroth-24B-v1-MPOA") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Naphula/Magistaroth-24B-v1-MPOA") model = AutoModelForCausalLM.from_pretrained("Naphula/Magistaroth-24B-v1-MPOA") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Naphula/Magistaroth-24B-v1-MPOA with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Naphula/Magistaroth-24B-v1-MPOA" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Naphula/Magistaroth-24B-v1-MPOA", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Naphula/Magistaroth-24B-v1-MPOA
- SGLang
How to use Naphula/Magistaroth-24B-v1-MPOA with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Naphula/Magistaroth-24B-v1-MPOA" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Naphula/Magistaroth-24B-v1-MPOA", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Naphula/Magistaroth-24B-v1-MPOA" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Naphula/Magistaroth-24B-v1-MPOA", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Naphula/Magistaroth-24B-v1-MPOA with Docker Model Runner:
docker model run hf.co/Naphula/Magistaroth-24B-v1-MPOA
Smartest RP Mistral 24B I've tested
It checks all the boxes! ✅
Why it is dumb :\ triedQ8,Q6,Q6-imat but still.. :
it's creative but dumb.
Did u compare to the non - MPOA version? The scale used was 1.3 so maybe ablation trimmed some brain cells. But I think in general 24B has limits and anything this small is going to seem dumb in some ways
if u notice that finetunes work better it coud just be the way della handles vectors and scrambles osme things it shouldnt
What's the KL rate? and One thing i used Static Q6 and idk why but it's better than the imat Q6 or the static Q8 lol. It's Intelligent enough and Creativity is peak. Also I'll say thanks for the MPOA it's the first model which gives right answer in my nsfw rp without any refusals. other AI models in 24B just refuses (still being in character) this one gets the job done easily. Peak.
Not sure, I have to set up some tools to check KL divergence since Heretic is too slow on my PC. Glad the MPOA works for you. v1.1 is also abliterated and may have a baked in 'higher temperature' like effect. I usually test with Q6 static and either IQ4_XS or IQ4_NL depending on the model arch.
Did u compare to the non - MPOA version? The scale used was 1.3 so maybe ablation trimmed some brain cells. But I think in general 24B has limits and anything this small is going to seem dumb in some ways
I actually did compare the two and repeated the test three times, because I couldn’t believe the results… MPOA was smarter, at least regarding my private test suite. Though I later found out, that the original Magidonia 4.3 is also quite capable, but hadn’t tested it before. Overall, I like the MPOA better, writing style seems more… unusual in a good way. I’ll be back in action soon, meanwhile keep up the good work 👍