Instructions to use AIDC-AI/Marco-Mini-Global-Base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use AIDC-AI/Marco-Mini-Global-Base with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="AIDC-AI/Marco-Mini-Global-Base")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("AIDC-AI/Marco-Mini-Global-Base")
model = AutoModelForCausalLM.from_pretrained("AIDC-AI/Marco-Mini-Global-Base")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use AIDC-AI/Marco-Mini-Global-Base with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "AIDC-AI/Marco-Mini-Global-Base"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AIDC-AI/Marco-Mini-Global-Base",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/AIDC-AI/Marco-Mini-Global-Base

SGLang

How to use AIDC-AI/Marco-Mini-Global-Base with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "AIDC-AI/Marco-Mini-Global-Base" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AIDC-AI/Marco-Mini-Global-Base",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "AIDC-AI/Marco-Mini-Global-Base" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AIDC-AI/Marco-Mini-Global-Base",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use AIDC-AI/Marco-Mini-Global-Base with Docker Model Runner:
```
docker model run hf.co/AIDC-AI/Marco-Mini-Global-Base
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Marco-Mini-Global-Base

Marco-Mini-Global-Base is an extended variant of Marco-Mini-Base that scales linguistic coverage from 29 to 64 languages. It is a highly sparse Mixture-of-Experts (MoE) multilingual language model from the Marco-MoE family, developed by Alibaba International Digital Commerce. It activates only 0.86B out of 17.3B total parameters (5% activation ratio) per token while supporting 64 languages — demonstrating that the MoE architecture enables scalable language expansion without the interference typical of dense models.

Model Description

Marco-Mini-Global shares the same architecture as Marco-Mini-Base: a decoder-only Transformer with sparse MoE layers replacing standard FFN layers, upcycled from Qwen3-0.6B-Base using fine-grained sub-matrix splitting combined with Drop-Upcycling.

Configuration	Value
Total Parameters	17.3B
Activated Parameters	0.86B
Activation Ratio	5%
Num Layers	28
Model Dimension	1024
FFN Intermediate Dimension	3072
Q-Heads	16
KV-Heads	8
Head Dimension	128
Expert Dimension	768
Total Experts	256
Activated Experts	8
Tie Embeddings	True
Training FLOPs	$1.584 \times 10^{23}$

Training Details

Marco-Mini-Global-Base branches from the Stage-2 checkpoint of Marco-Mini-Base and recalibrates the data mixtures in Stages 3 and 4 to integrate pre-training corpora for 35 newly introduced languages. In total it was trained on 5.5T tokens.

The four-stage curriculum follows the same structure as Marco-Mini-Base:

Stage 1 (0 - 2.4T tokens): Foundational Training — High-quality English data (Nemotron-CC-v2), reasoning and instruction data, and multilingual web/QA data for 19 languages.
Stage 2 (2.4T - 4.1T tokens): Optimization & Upsampling — Upsampled reasoning corpora, downsampled English web data, and upsampled Chinese data with learning rate decay.
Stage 3 (4.1T - 5T tokens): Language Expansion — Recalibrated data mixtures to integrate 35 new languages alongside the original 29.
Stage 4 (5T - 5.5T tokens): Synthetic Data Integration — Curated multilingual synthetic data including cultural content and synthetic regional MCQs for all 64 languages.

Supported Languages

Original 29 languages: English, Chinese, Arabic, German, Spanish, French, Korean, Japanese, Portuguese, Turkish, Indonesian, Italian, Dutch, Polish, Russian, Vietnamese, Thai, Hebrew, Ukrainian, Malay, Bengali, Czech, Urdu, Kazakh, Greek, Romanian, Hungarian, Nepali, Azerbaijani

35 newly introduced languages: Danish, Swedish, Norwegian, Catalan, Galician, Welsh, Irish, Basque, Croatian, Latvian, Lithuanian, Slovak, Slovenian, Estonian, Finnish, Serbian, Bulgarian, Persian, Maltese, Hindi, Marathi, Gujarati, Punjabi, Tamil, Telugu, Tagalog, Javanese, Khmer, Lao, Burmese, Amharic, Swahili, Yoruba, Igbo, Zulu

Evaluation

We compare Marco-Mini-Global-Base against strong multilingual baselines: Gemma3-4B (4B activated), Tiny-Aya-3.35B (3.35B activated), and Qwen3-4B (4B activated). All benchmarks are evaluated across the full 64-language set. Marco-Mini-Global uses only 0.86B activated parameters while preserving robust English proficiency (63.6 vs. 63.7 for the 29-language Marco-Mini) and increasing the multilingual advantage over Qwen3-4B from +2.6% to +3.6%.

English

Benchmark	# Shots	Gemma3-4B	Tiny-Aya-3.35B	Qwen3-4B	Marco-Mini-Global
MMLU (Acc)	5-shot	61.1	58.6	75.2	72.9
MMLU-Redux (Acc)	0-shot	57.7	51.7	71.3	68.9
MMLU-Pro (Acc)	5-shot	28.8	26.9	45.9	44.5
AGIEval (Acc)	0-shot	32.6	29.0	44.0	41.0
BBH (EM)	3-shot	52.2	46.8	72.3	65.0
ARC-Easy (Acc)	0-shot	82.6	76.5	75.0	82.4
ARC-Challenge (Acc)	0-shot	54.1	47.4	49.9	57.0
HellaSwag (Acc)	0-shot	76.7	71.0	74.4	77.2
WinoGrande (Acc)	0-shot	61.4	56.6	59.6	58.3
BoolQ (Acc)	0-shot	76.6	74.6	74.2	75.6
CommonsenseQA (Acc)	0-shot	61.1	60.4	52.9	61.2
OpenBookQA (Acc)	0-shot	42.6	40.4	42.6	45.0
PIQA (Acc)	0-shot	80.3	76.9	77.4	80.7
SIQA (Acc)	0-shot	50.4	49.9	53.0	48.4
GSM8K (EM)	5-shot	39.3	58.0	81.7	76.4
Average	-	57.2	55.5	63.3	63.6

Multilingual — General

Benchmark	# Shots	Gemma3-4B	Tiny-Aya-3.35B	Qwen3-4B	Marco-Mini-Global
GlobalMMLU (Acc)	5-shot	49.1	48.4	57.8	60.9
MMMLU (Acc)	0-shot	45.0	42.8	54.8	58.2
MMLU-ProX-Lite (Acc)	5-shot	23.3	23.5	35.6	36.2
BELEBELE (Acc)	0-shot	62.3	62.5	74.0	76.0
mHellaSwag (Acc_norm)	0-shot	51.9	50.3	48.5	54.4
mARC-Challenge (Acc_norm)	0-shot	39.3	35.7	39.3	41.2
FLORES-200 En→Xx (BLEU)	5-shot	27.9	25.6	25.8	29.5
FLORES-200 Xx→En (BLEU)	5-shot	39.2	37.2	33.4	40.2
WMT24++ En→Xx (BLEU)	5-shot	26.0	24.4	19.6	26.0
WMT24++ Xx→En (BLEU)	5-shot	34.4	32.9	31.2	34.5
MGSM (EM)	8-shot	35.7	36.6	69.1	71.7
Average	-	39.5	37.3	44.5	48.1

Multilingual — Cultural & Regional

Benchmark	# Shots	Gemma3-4B	Tiny-Aya-3.35B	Qwen3-4B	Marco-Mini-Global
INCLUDE (Acc)	5-shot	52.3	53.5	60.0	61.1
Global-PIQA (Acc_norm)	0-shot	67.8	66.7	61.8	70.2
CMMLU (Acc)	5-shot	50.2	58.8	76.2	67.9
C-Eval (Acc)	5-shot	48.5	57.6	76.6	66.2
ArabicMMLU (Acc)	3-shot	61.6	63.2	67.0	66.6
TurkishMMLU (Acc)	5-shot	43.7	45.2	60.6	63.1
GreekMMLU (Acc)	5-shot	63.4	66.3	69.4	70.4
KazakhMMLU (Acc)	5-shot	52.1	47.1	62.3	61.8
IndoMMLU (Acc)	0-shot	48.5	52.0	60.1	59.5
IndoCareer (Acc)	3-shot	53.4	56.6	61.5	61.8
IndoCulture (Acc)	0-shot	59.1	58.5	61.1	62.5
Average	-	54.6	56.9	65.1	64.7

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "AIDC-AI/Marco-Mini-Global-Base"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

input_text = "The capital of France is"
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Citation

@article{marco-moe,
  title={Marco-MoE: Open Multilingual Mixture-of-Expert Language Models with Efficient Upcycling},
  author={Fan Jiang, Yu Zhao, Chenyang Lyu, Tianqi Shi, Yichao Du, Feihu Jiang, Longyue Wang and Weihua Luo},
  year={2026}
}

License

This model is released under the Apache 2.0 License.

Downloads last month: 146

Safetensors

Model size

17B params

Tensor type

BF16

Model tree for AIDC-AI/Marco-Mini-Global-Base

Quantizations

2 models

Datasets used to train AIDC-AI/Marco-Mini-Global-Base

Collection including AIDC-AI/Marco-Mini-Global-Base

Marco-MoE

Collection

A suit of multilingual MoE models with highly-sparse architectures • 5 items • Updated Apr 8 • 17