Instructions to use Naphula/Magistaroth-24B-v1-MPOA with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Naphula/Magistaroth-24B-v1-MPOA with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Naphula/Magistaroth-24B-v1-MPOA")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Naphula/Magistaroth-24B-v1-MPOA")
model = AutoModelForCausalLM.from_pretrained("Naphula/Magistaroth-24B-v1-MPOA")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Naphula/Magistaroth-24B-v1-MPOA with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Naphula/Magistaroth-24B-v1-MPOA"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Naphula/Magistaroth-24B-v1-MPOA",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Naphula/Magistaroth-24B-v1-MPOA

SGLang

How to use Naphula/Magistaroth-24B-v1-MPOA with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Naphula/Magistaroth-24B-v1-MPOA" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Naphula/Magistaroth-24B-v1-MPOA",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Naphula/Magistaroth-24B-v1-MPOA" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Naphula/Magistaroth-24B-v1-MPOA",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Naphula/Magistaroth-24B-v1-MPOA with Docker Model Runner:
```
docker model run hf.co/Naphula/Magistaroth-24B-v1-MPOA
```

Smartest RP Mistral 24B I've tested

by McG-221 - opened Feb 26

Discussion

McG-221

Feb 26

It checks all the boxes! ✅

Darkknight535

Mar 3

Why it is dumb :\ triedQ8,Q6,Q6-imat but still.. :
it's creative but dumb.

Naphula

Owner Mar 4

Did u compare to the non - MPOA version? The scale used was 1.3 so maybe ablation trimmed some brain cells. But I think in general 24B has limits and anything this small is going to seem dumb in some ways

if u notice that finetunes work better it coud just be the way della handles vectors and scrambles osme things it shouldnt

Darkknight535

Mar 4

What's the KL rate? and One thing i used Static Q6 and idk why but it's better than the imat Q6 or the static Q8 lol. It's Intelligent enough and Creativity is peak. Also I'll say thanks for the MPOA it's the first model which gives right answer in my nsfw rp without any refusals. other AI models in 24B just refuses (still being in character) this one gets the job done easily. Peak.

Naphula

Owner Mar 5

Not sure, I have to set up some tools to check KL divergence since Heretic is too slow on my PC. Glad the MPOA works for you. v1.1 is also abliterated and may have a baked in 'higher temperature' like effect. I usually test with Q6 static and either IQ4_XS or IQ4_NL depending on the model arch.

McG-221

Mar 5

Did u compare to the non - MPOA version? The scale used was 1.3 so maybe ablation trimmed some brain cells. But I think in general 24B has limits and anything this small is going to seem dumb in some ways

I actually did compare the two and repeated the test three times, because I couldn’t believe the results… MPOA was smarter, at least regarding my private test suite. Though I later found out, that the original Magidonia 4.3 is also quite capable, but hadn’t tested it before. Overall, I like the MPOA better, writing style seems more… unusual in a good way. I’ll be back in action soon, meanwhile keep up the good work 👍

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment