How to use from the
Use from the
Transformers library
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="togethercomputer/M1-3B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("togethercomputer/M1-3B")
model = AutoModelForCausalLM.from_pretrained("togethercomputer/M1-3B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))
Quick Links

This is the model is trained using paper, M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models.

Model AIME 2025 AIME 2024 MATH 500 AMC 2023 OlympiadBench
Qwen2.5-Math-7B-Instruct (Transformer) – 13.3 79.8 50.6 40.7
rStar-Math-7B (Transformer) – 26.7 78.4 47.5 47.1
Eurus-2-7B-PRIME (Transformer) – 26.7 79.2 57.8 42.1
Qwen2.5-7B-SimpleRL (Transformer) – 26.7 82.4 62.5 43.3
DeepSeek-R1-Distill-Qwen-1.5B (Transformer) 23.0 28.8 82.8 62.9 43.3
M1-3B (Mamba Hybrid Models) 23.5 28.5 84.0 62.8 47.3

Code: https://github.com/jxiw/M1

@article{wang2025m1scalabletesttimecompute,
  title={M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models}, 
  author={Junxiong Wang and Wen-Ding Li and Daniele Paliotta and Daniel Ritter and Alexander M. Rush and Tri Dao},
  journal={arXiv preprint arXiv:2504.10449},
  year={2025},
  url={https://arxiv.org/abs/2504.10449}, 
}
Downloads last month
133
Safetensors
Model size
3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for togethercomputer/M1-3B