Instructions to use AIDC-AI/Marco-Nano-Base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use AIDC-AI/Marco-Nano-Base with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="AIDC-AI/Marco-Nano-Base")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("AIDC-AI/Marco-Nano-Base")
model = AutoModelForCausalLM.from_pretrained("AIDC-AI/Marco-Nano-Base")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use AIDC-AI/Marco-Nano-Base with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "AIDC-AI/Marco-Nano-Base"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AIDC-AI/Marco-Nano-Base",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/AIDC-AI/Marco-Nano-Base

SGLang

How to use AIDC-AI/Marco-Nano-Base with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "AIDC-AI/Marco-Nano-Base" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AIDC-AI/Marco-Nano-Base",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "AIDC-AI/Marco-Nano-Base" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AIDC-AI/Marco-Nano-Base",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use AIDC-AI/Marco-Nano-Base with Docker Model Runner:
```
docker model run hf.co/AIDC-AI/Marco-Nano-Base
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Marco-Nano-Base

Marco-Nano-Base is a compact, highly sparse Mixture-of-Experts (MoE) multilingual language model from the Marco-MoE family, developed by Alibaba International Digital Commerce. It activates only 0.6B out of 8B total parameters (7.5% activation ratio) per token, achieving strong English and multilingual performance across 29 languages while requiring significantly less compute than comparable dense models.

Model Description

Marco-Nano is built on a decoder-only Transformer architecture with sparse MoE layers replacing standard FFN layers. It is upcycled from Qwen3-0.6B-Base using a fine-grained sub-matrix splitting strategy combined with Drop-Upcycling to promote expert diversification.

Configuration	Value
Total Parameters	8B
Activated Parameters	0.6B
Activation Ratio	7.5%
Num Layers	28
Model Dimension	1024
FFN Intermediate Dimension	3072
Q-Heads	16
KV-Heads	8
Head Dimension	128
Expert Dimension	384
Total Experts	232
Activated Experts	8
Tie Embeddings	True
Training FLOPs	$1.40 \times 10^{23}$

Training Details

Marco-Nano was pre-trained on 5.1 trillion tokens using a four-stage curriculum:

Stage 1 (0 - 2.4T tokens): Foundational Training — High-quality English data (Nemotron-CC-v2), reasoning and instruction data, and multilingual web/QA data for 19 languages.
Stage 2 (2.4T - 4.1T tokens): Optimization & Upsampling — Upsampled reasoning corpora, downsampled English web data, and upsampled Chinese data with learning rate decay.
Stage 3 (4.1T - 4.6T tokens): Language Expansion — Added 9 new languages (Bengali, Czech, Urdu, Kazakh, Greek, Romanian, Hungarian, Nepali, Azerbaijani) and upsampled medium-resource languages.
Stage 4 (4.6T - 5.1T tokens): Synthetic Data Integration — Curated multilingual synthetic data including cultural content (Fineweb2-Culture) and synthetic regional MCQs.

Supported Languages

English, Chinese, Arabic, German, Spanish, French, Korean, Japanese, Portuguese, Turkish, Indonesian, Italian, Dutch, Polish, Russian, Vietnamese, Thai, Hebrew, Ukrainian, Malay, Bengali, Czech, Urdu, Kazakh, Greek, Romanian, Hungarian, Nepali, Azerbaijani

Evaluation

We compare Marco-Nano against size-matched baselines: Qwen3-1.7B (1.7B activated), Trinity Nano (1.09B activated), and Granite4-Tiny (1.47B activated). Marco-Nano uses only 0.6B activated parameters — the smallest among all baselines.

English

Benchmark	# Shots	Qwen3-1.7B	Trinity Nano	Granite4-Tiny	Marco-Nano
MMLU (Acc)	5-shot	65.1	64.7	69.1	64.7
MMLU-Redux (Acc)	0-shot	61.2	60.1	65.8	62.9
MMLU-Pro (Acc)	5-shot	33.2	32.0	32.1	35.9
AGIEval (Acc)	0-shot	35.9	31.4	36.1	38.4
BBH (EM)	3-shot	54.5	49.3	59.9	53.5
ARC-Easy (Acc)	0-shot	69.3	77.9	78.5	75.3
ARC-Challenge (Acc)	0-shot	42.8	53.5	52.3	49.4
HellaSwag (Acc)	0-shot	66.6	77.4	77.9	69.2
WinoGrande (Acc)	0-shot	57.1	57.1	58.6	53.4
BoolQ (Acc)	0-shot	74.6	71.5	63.5	71.2
CommonsenseQA (Acc)	0-shot	49.5	54.1	55.9	55.7
OpenBookQA (Acc)	0-shot	36.4	42.0	43.6	39.4
PIQA (Acc)	0-shot	75.5	69.6	80.6	76.5
SIQA (Acc)	0-shot	47.8	52.7	53.0	46.0
GSM8K (EM)	5-shot	69.1	57.8	70.7	69.7
Average	-	55.9	56.7	59.8	57.5

Multilingual — General

Benchmark	# Shots	Qwen3-1.7B	Trinity Nano	Granite4-Tiny	Marco-Nano
GlobalMMLU (Acc)	5-shot	49.6	43.6	54.8	52.2
MMMLU (Acc)	0-shot	48.6	41.2	52.3	52.6
MMLU-ProX-Lite (Acc)	5-shot	27.2	20.3	30.1	28.9
BELEBELE (Acc)	0-shot	67.5	54.5	61.2	73.8
mHellaSwag (Acc_norm)	0-shot	43.9	42.5	53.2	48.8
mARC-Challenge (Acc_norm)	0-shot	34.7	30.9	39.9	36.9
FLORES-200 En→Xx (BLEU)	5-shot	18.6	15.1	25.4	24.7
FLORES-200 Xx→En (BLEU)	5-shot	31.5	31.1	36.7	33.6
WMT24++ En→Xx (BLEU)	5-shot	18.3	15.0	21.9	20.7
WMT24++ Xx→En (BLEU)	5-shot	28.3	28.0	30.7	28.1
MGSM (EM)	8-shot	58.8	40.6	56.7	65.3
Average	-	38.8	33.0	42.1	42.3

Multilingual — Cultural & Regional

Benchmark	# Shots	Qwen3-1.7B	Trinity Nano	Granite4-Tiny	Marco-Nano
INCLUDE (Acc)	5-shot	51.2	43.9	52.1	53.2
Global-PIQA (Acc_norm)	0-shot	60.3	52.3	64.0	64.3
CMMLU (Acc)	5-shot	66.1	49.6	53.5	55.5
C-Eval (Acc)	5-shot	65.1	47.6	50.9	56.0
ArabicMMLU (Acc)	3-shot	57.6	44.0	60.5	55.8
TurkishMMLU (Acc)	5-shot	47.9	29.6	41.8	48.9
GreekMMLU (Acc)	5-shot	58.1	52.2	62.3	64.1
KazakhMMLU (Acc)	5-shot	52.1	43.1	52.6	53.1
IndoMMLU (Acc)	0-shot	51.0	41.5	49.0	51.0
IndoCareer (Acc)	3-shot	53.9	46.7	53.0	52.1
IndoCulture (Acc)	0-shot	51.6	49.8	51.3	57.4
Average	-	55.9	45.5	53.7	55.6

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "AIDC-AI/Marco-Nano-Base"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

input_text = "The capital of France is"
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Citation

@article{marco-moe,
  title={Marco-MoE: Open Multilingual Mixture-of-Expert Language Models with Efficient Upcycling},
  author={Fan Jiang, Yu Zhao, Chenyang Lyu, Tianqi Shi, Yichao Du, Feihu Jiang, Longyue Wang and Weihua Luo},
  year={2026}
}

License

This model is released under the Apache 2.0 License.

Downloads last month: 154

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for AIDC-AI/Marco-Nano-Base

Quantizations

2 models

Datasets used to train AIDC-AI/Marco-Nano-Base

Collection including AIDC-AI/Marco-Nano-Base

Marco-MoE

Collection

A suit of multilingual MoE models with highly-sparse architectures • 5 items • Updated Apr 8 • 17