Instructions to use skilledu/Mellum2-12B-A2.5B-Base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use skilledu/Mellum2-12B-A2.5B-Base with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="skilledu/Mellum2-12B-A2.5B-Base")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("skilledu/Mellum2-12B-A2.5B-Base")
model = AutoModelForCausalLM.from_pretrained("skilledu/Mellum2-12B-A2.5B-Base", device_map="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use skilledu/Mellum2-12B-A2.5B-Base with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "skilledu/Mellum2-12B-A2.5B-Base"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "skilledu/Mellum2-12B-A2.5B-Base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/skilledu/Mellum2-12B-A2.5B-Base

SGLang

How to use skilledu/Mellum2-12B-A2.5B-Base with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "skilledu/Mellum2-12B-A2.5B-Base" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "skilledu/Mellum2-12B-A2.5B-Base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "skilledu/Mellum2-12B-A2.5B-Base" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "skilledu/Mellum2-12B-A2.5B-Base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use skilledu/Mellum2-12B-A2.5B-Base with Docker Model Runner:
```
docker model run hf.co/skilledu/Mellum2-12B-A2.5B-Base
```

Mellum2 Base

Use this checkpoint as the starting point for your own fine-tuning, alignment, or domain adaptation on top of the long-context base. For instruction-following or reasoning tasks out of the box, use Instruct or Thinking instead.

Mellum2 Base Highlights

Mellum2 Base is a long-context pretrained causal language model trained by JetBrains.

The model uses a Mixture-of-Experts architecture with 64 experts and activates 8 experts per token. It uses a combination of sliding-window and full attention layers, with a context length of 131,072 tokens.

This is the long-context base, produced from Mellum2-12B-A2.5B-Base-Pretrain by a layer-selective YaRN extension stage that re-maps RoPE frequencies on the global-attention layers only. It is the shared starting point for the released Instruct and Thinking variants.

Mellum2 Model Family

This repository contains one checkpoint from the Mellum2 family.

Checkpoint	Description
Base Pretrain	Base checkpoint before long-context extension
Base	Final base model
Instruct SFT	Supervised instruction-tuned checkpoint
Thinking SFT	Supervised thinking checkpoint
Instruct	RL-tuned instruction model
Thinking	RL-tuned thinking model

Model Overview

Mellum2 Base has the following features:

Number of Layers: 28
Hidden Size: 2304
Intermediate Size: 7168
MoE Intermediate Size: 896
Number of Experts: 64
Number of Activated Experts: 8
Number of Attention Heads (GQA): 32 for Q and 4 for KV
Context Length: 131,072
Sliding Window: 1,024
Vocabulary Size: 98,304
Precision: bfloat16

Serving with vLLM

vllm serve JetBrains/Mellum2-12B-A2.5B-Base --max-model-len 131072

Quickstart

Text-Only Input (base model — use the completions endpoint, not chat)

from openai import OpenAI
# Configured by environment variables
client = OpenAI()

completion = client.completions.create(
    model="JetBrains/Mellum2-12B-A2.5B-Base",
    prompt="def fibonacci(n):\n    ",
    max_tokens=81920,
    temperature=0.6,
    top_p=0.95,
    extra_body={
        "top_k": 20,
    },
)
print("Completion:", completion)

Evaluation

Mellum2 Base pretraining results compared with similarly-sized open base models. All values are self-reported by JetBrains.

Benchmark	Mellum2 (12B-A2.5B)	OLMo-3 (7B)	Qwen2.5 (7B)	Qwen3 (4B)	Qwen3.5 (4B)
Code Generation
HumanEval	41.5	45.1	55.5	57.3	50.0
HumanEval+	37.2	39.6	47.0	51.2	43.9
MBPP	62.4	50.6	63.6	67.0	52.2
MBPP+	61.4	52.9	64.0	64.5	55.0
MultiPL-E (7 langs)	21.0	10.0	19.2	26.0	12.1
CRUXEval-I	45.4	38.8	44.0	44.6	49.1
CRUXEval-O	43.9	36.6	42.9	43.5	43.2
Knowledge & Reasoning
MMLU	70.9	62.1	71.8	71.1	74.2
MMLU-Pro	59.3	34.5	48.6	51.5	52.4
BBH	74.9	63.6	69.0	71.3	80.2
ARC-Challenge	53.5	53.6	51.3	51.2	54.9
HellaSwag	73.7	74.2	78.9	73.7	75.3
WinoGrande	65.5	69.5	73.3	71.2	70.8
TruthfulQA MC2	44.5	47.0	56.4	53.5	52.1
Math & Science
GSM8K	81.7	73.5	81.9	82.0	80.1
MATH	10.0	18.7	24.6	27.7	25.3
GPQA Diamond	31.3	28.8	32.8	36.9	41.4
GPQA Main	35.0	27.9	34.2	36.8	40.2

For more details, see the Mellum2 Technical Report.

License

Released under the Apache 2.0 license.

Downloads last month: 5

Safetensors

Model size

12B params

Tensor type

BF16

Paper for skilledu/Mellum2-12B-A2.5B-Base

Mellum2 Technical Report

Paper • 2605.31268 • Published May 29 • 59

Evaluation results

pass@1 on HumanEval
self-reported

41.460
pass@1 on HumanEval+
self-reported

37.200
pass@1 on MBPP
self-reported

62.400
pass@1 on MBPP+
self-reported

78.310
pass@1 on MultiPL-E HumanEval, 7 languages
self-reported

20.970
pass@1 on CRUXEval-I
self-reported

45.380
pass@1 on CRUXEval-O
self-reported

43.880
accuracy on MMLU
self-reported

70.870