Instructions to use inflatebot/MN-12B-Mag-Mell-R1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use inflatebot/MN-12B-Mag-Mell-R1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="inflatebot/MN-12B-Mag-Mell-R1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("inflatebot/MN-12B-Mag-Mell-R1")
model = AutoModelForCausalLM.from_pretrained("inflatebot/MN-12B-Mag-Mell-R1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Local Apps Settings

vLLM

How to use inflatebot/MN-12B-Mag-Mell-R1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "inflatebot/MN-12B-Mag-Mell-R1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "inflatebot/MN-12B-Mag-Mell-R1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/inflatebot/MN-12B-Mag-Mell-R1

SGLang

How to use inflatebot/MN-12B-Mag-Mell-R1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "inflatebot/MN-12B-Mag-Mell-R1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "inflatebot/MN-12B-Mag-Mell-R1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "inflatebot/MN-12B-Mag-Mell-R1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "inflatebot/MN-12B-Mag-Mell-R1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use inflatebot/MN-12B-Mag-Mell-R1 with Docker Model Runner:
```
docker model run hf.co/inflatebot/MN-12B-Mag-Mell-R1
```

Recommended temperature for use with Top nsigma?

#15

by unbalancedmercy - opened Mar 15, 2025

Discussion

unbalancedmercy

Mar 15, 2025

Top nsigma is a sampler designed to replace Min P and maintain coherence at higher temperatures. Is there a recommended temperature for when this sampler is turned on?

inflatebot

Owner Mar 16, 2025

I haven't gotten a chance to play with it and learn what good settings for it looks like, because honestly MinP+temp+DRY tends to work well enough for me these days.
Mag Mell holds it together decently well to 1.25 with MinP though, which is remarkably high for Nemo, although the stability issues we had with it at long context were more pronounced at that temp. Maybe that's a good place to start? Literal guess.

unbalancedmercy

Mar 16, 2025

I would expect it to be higher, since top nsigma is specifically intended to fix the problem min p has at high temps.

inflatebot

Owner Mar 16, 2025

•

edited Mar 16, 2025

As would I; but again, I'm not experienced with this sampler and I don't feel comfortable throwing numbers around when I don't know they're going to work for you. I would rather give no advice than bad advice that wastes your time.

Darkknight535

Apr 16, 2025

I use 1.5 Temp with MinP 0.1 and rep penality 1.1 without any issues.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment