Instructions to use inflatebot/MN-12B-Mag-Mell-R1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use inflatebot/MN-12B-Mag-Mell-R1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="inflatebot/MN-12B-Mag-Mell-R1") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("inflatebot/MN-12B-Mag-Mell-R1") model = AutoModelForCausalLM.from_pretrained("inflatebot/MN-12B-Mag-Mell-R1") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use inflatebot/MN-12B-Mag-Mell-R1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "inflatebot/MN-12B-Mag-Mell-R1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "inflatebot/MN-12B-Mag-Mell-R1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/inflatebot/MN-12B-Mag-Mell-R1
- SGLang
How to use inflatebot/MN-12B-Mag-Mell-R1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "inflatebot/MN-12B-Mag-Mell-R1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "inflatebot/MN-12B-Mag-Mell-R1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "inflatebot/MN-12B-Mag-Mell-R1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "inflatebot/MN-12B-Mag-Mell-R1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use inflatebot/MN-12B-Mag-Mell-R1 with Docker Model Runner:
docker model run hf.co/inflatebot/MN-12B-Mag-Mell-R1
Recommended temperature for use with Top nsigma?
Top nsigma is a sampler designed to replace Min P and maintain coherence at higher temperatures. Is there a recommended temperature for when this sampler is turned on?
I haven't gotten a chance to play with it and learn what good settings for it looks like, because honestly MinP+temp+DRY tends to work well enough for me these days.
Mag Mell holds it together decently well to 1.25 with MinP though, which is remarkably high for Nemo, although the stability issues we had with it at long context were more pronounced at that temp. Maybe that's a good place to start? Literal guess.
I would expect it to be higher, since top nsigma is specifically intended to fix the problem min p has at high temps.
As would I; but again, I'm not experienced with this sampler and I don't feel comfortable throwing numbers around when I don't know they're going to work for you. I would rather give no advice than bad advice that wastes your time.
I use 1.5 Temp with MinP 0.1 and rep penality 1.1 without any issues.