Instructions to use ATH-MaaS/Marco-Mini-Base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ATH-MaaS/Marco-Mini-Base with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ATH-MaaS/Marco-Mini-Base")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("ATH-MaaS/Marco-Mini-Base")
model = AutoModelForCausalLM.from_pretrained("ATH-MaaS/Marco-Mini-Base")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use ATH-MaaS/Marco-Mini-Base with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ATH-MaaS/Marco-Mini-Base"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ATH-MaaS/Marco-Mini-Base",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/ATH-MaaS/Marco-Mini-Base

SGLang

How to use ATH-MaaS/Marco-Mini-Base with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ATH-MaaS/Marco-Mini-Base" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ATH-MaaS/Marco-Mini-Base",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ATH-MaaS/Marco-Mini-Base" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ATH-MaaS/Marco-Mini-Base",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use ATH-MaaS/Marco-Mini-Base with Docker Model Runner:
```
docker model run hf.co/ATH-MaaS/Marco-Mini-Base
```

Marco-Mini-Base / README.md

fanjiang98

Upload folder using huggingface_hub

283c9c7 verified 3 months ago

preview code

Raw

History Blame Contribute Delete

8.27 kB

	---
	license: apache-2.0
	language:
	- en
	- zh
	- ar
	- de
	- es
	- fr
	- ko
	- ja
	- pt
	- tr
	- id
	- it
	- nl
	- pl
	- ru
	- vi
	- th
	- he
	- uk
	- ms
	- bn
	- cs
	- ur
	- kk
	- el
	- ro
	- hu
	- ne
	- az
	library_name: transformers
	tags:
	- moe
	- mixture-of-experts
	- multilingual
	- upcycling
	datasets:
	- nvidia/Nemotron-CC-v2
	- nvidia/Nemotron-Pretraining-SFT-v1
	- nvidia/Nemotron-Pretraining-Specialized-v1
	- nvidia/Nemotron-CC-v2.1
	- allenai/dolmino-mix-1124
	- nvidia/Nemotron-CC-Math-v1
	- nvidia/OpenMathInstruct-2
	- HuggingFaceTB/finemath
	- LLM360/MegaMath
	- open-thoughts/OpenThoughts3-1.2M
	- opencsg/Fineweb-Edu-Chinese-V2.1
	- HuggingFaceFW/fineweb-2
	- allenai/dolma3_dolmino_mix-100B-1125
	---

	# Marco-Mini-Base

	Marco-Mini-Base is a compact, highly sparse Mixture-of-Experts (MoE) multilingual language model from the [Marco-MoE](https://github.com/AIDC-AI/Marco-LLM) family, developed by Alibaba International Digital Commerce. It activates only 0.86B out of 17.3B total parameters (5% activation ratio) per token, matching or surpassing dense models with up to 4B parameters on English and multilingual benchmarks across 29 languages — while using 5.5x fewer training FLOPs than Qwen3-4B.

	## Model Description

	Marco-Mini is built on a decoder-only Transformer architecture with sparse MoE layers replacing standard FFN layers. It is upcycled from [Qwen3-0.6B-Base](https://huggingface.co/Qwen/Qwen3-0.6B-Base) using a fine-grained sub-matrix splitting strategy combined with Drop-Upcycling to promote expert diversification.

	\| Configuration \| Value \|
	\|:---\|:---:\|
	\| Total Parameters \| 17.3B \|
	\| Activated Parameters \| 0.86B \|
	\| Activation Ratio \| 5% \|
	\| Num Layers \| 28 \|
	\| Model Dimension \| 1024 \|
	\| FFN Intermediate Dimension \| 3072 \|
	\| Q-Heads \| 16 \|
	\| KV-Heads \| 8 \|
	\| Head Dimension \| 128 \|
	\| Expert Dimension \| 768 \|
	\| Total Experts \| 256 \|
	\| Activated Experts \| 8 \|
	\| Tie Embeddings \| True \|
	\| Training FLOPs \| $1.56 \times 10^{23}$ \|

	## Training Details

	Marco-Mini was pre-trained on 5.1 trillion tokens using a four-stage curriculum:

	1. Stage 1 (0 - 2.4T tokens): Foundational Training — High-quality English data (Nemotron-CC-v2), reasoning and instruction data, and multilingual web/QA data for 19 languages.
	2. Stage 2 (2.4T - 4.1T tokens): Optimization & Upsampling — Upsampled reasoning corpora, downsampled English web data, and upsampled Chinese data with learning rate decay.
	3. Stage 3 (4.1T - 4.6T tokens): Language Expansion — Added 9 new languages (Bengali, Czech, Urdu, Kazakh, Greek, Romanian, Hungarian, Nepali, Azerbaijani) and upsampled medium-resource languages.
	4. Stage 4 (4.6T - 5.1T tokens): Synthetic Data Integration — Curated multilingual synthetic data including cultural content (Fineweb2-Culture) and synthetic regional MCQs.

	## Supported Languages

	English, Chinese, Arabic, German, Spanish, French, Korean, Japanese, Portuguese, Turkish, Indonesian, Italian, Dutch, Polish, Russian, Vietnamese, Thai, Hebrew, Ukrainian, Malay, Bengali, Czech, Urdu, Kazakh, Greek, Romanian, Hungarian, Nepali, Azerbaijani

	## Evaluation

	We compare Marco-Mini against strong baselines: Qwen3-4B (4B activated), Trinity Mini (3.85B activated), Gemma3-4B (4B activated), SmolLM3-3B (3B activated), Llama3.2-3B (3B activated), and Tiny-Aya-3.35B (3.35B activated). Marco-Mini uses only 0.86B activated parameters — far fewer than all baselines.

	### English

	\| Benchmark \| # Shots \| Llama3.2-3B \| SmolLM3-3B \| Gemma3-4B \| Tiny-Aya-3.35B \| Qwen3-4B \| Trinity Mini \| Marco-Mini \|
	\|:---\|:---:\|:---:\|:---:\|:---:\|:---:\|:---:\|:---:\|:---:\|
	\| MMLU _(Acc)_ \| 5-shot \| 57.6 \| 62.6 \| 61.1 \| 58.6 \| 75.2 \| 71.4 \| 72.8 \|
	\| MMLU-Redux _(Acc)_ \| 0-shot \| 56.9 \| 58.4 \| 57.7 \| 51.7 \| 71.3 \| 68.2 \| 68.8 \|
	\| MMLU-Pro _(Acc)_ \| 5-shot \| 26.0 \| 35.1 \| 28.8 \| 26.9 \| 45.9 \| 41.3 \| 45.3 \|
	\| AGIEval _(Acc)_ \| 0-shot \| 31.2 \| 34.5 \| 32.6 \| 29.0 \| 44.0 \| 39.7 \| 41.9 \|
	\| BBH _(EM)_ \| 3-shot \| 47.1 \| 60.0 \| 52.2 \| 46.8 \| 72.3 \| 57.6 \| 65.1 \|
	\| ARC-Easy _(Acc)_ \| 0-shot \| 71.8 \| 78.5 \| 82.6 \| 76.5 \| 75.0 \| 80.6 \| 82.4 \|
	\| ARC-Challenge _(Acc)_ \| 0-shot \| 46.0 \| 52.6 \| 54.1 \| 47.4 \| 49.9 \| 57.8 \| 56.3 \|
	\| HellaSwag _(Acc)_ \| 0-shot \| 75.6 \| 76.1 \| 76.7 \| 71.0 \| 74.4 \| 82.8 \| 77.4 \|
	\| WinoGrande _(Acc)_ \| 0-shot \| 58.6 \| 58.9 \| 61.4 \| 56.6 \| 59.6 \| 60.8 \| 57.7 \|
	\| BoolQ _(Acc)_ \| 0-shot \| 75.2 \| 79.3 \| 76.6 \| 74.6 \| 74.2 \| 72.5 \| 74.2 \|
	\| CommonsenseQA _(Acc)_ \| 0-shot \| 60.4 \| 55.4 \| 61.1 \| 60.4 \| 52.9 \| 57.7 \| 61.5 \|
	\| OpenBookQA _(Acc)_ \| 0-shot \| 42.2 \| 40.4 \| 42.6 \| 40.4 \| 42.6 \| 44.8 \| 44.6 \|
	\| PIQA _(Acc)_ \| 0-shot \| 78.2 \| 79.1 \| 80.3 \| 76.9 \| 77.4 \| 71.7 \| 81.1 \|
	\| SIQA _(Acc)_ \| 0-shot \| 51.0 \| 49.8 \| 50.4 \| 49.9 \| 53.0 \| 52.5 \| 49.4 \|
	\| GSM8K _(EM)_ \| 5-shot \| 27.3 \| 67.4 \| 39.3 \| 58.0 \| 81.7 \| 57.5 \| 76.4 \|
	\| Average \| - \| 53.7 \| 59.2 \| 57.2 \| 55.5 \| 63.3 \| 61.1 \| 63.7 \|

	### Multilingual — General

	\| Benchmark \| # Shots \| Llama3.2-3B \| SmolLM3-3B \| Gemma3-4B \| Tiny-Aya-3.35B \| Qwen3-4B \| Trinity Mini \| Marco-Mini \|
	\|:---\|:---:\|:---:\|:---:\|:---:\|:---:\|:---:\|:---:\|:---:\|
	\| GlobalMMLU _(Acc)_ \| 5-shot \| 43.2 \| 46.7 \| 50.8 \| 50.0 \| 61.6 \| 52.6 \| 64.2 \|
	\| MMMLU _(Acc)_ \| 0-shot \| 44.0 \| 47.3 \| 47.4 \| 44.5 \| 59.3 \| 50.9 \| 62.0 \|
	\| MMLU-ProX-Lite _(Acc)_ \| 5-shot \| 22.4 \| 28.3 \| 24.3 \| 24.3 \| 38.5 \| 32.2 \| 39.2 \|
	\| BELEBELE _(Acc)_ \| 0-shot \| 60.1 \| 54.3 \| 65.7 \| 65.4 \| 81.5 \| 67.6 \| 79.8 \|
	\| mHellaSwag _(Acc_norm)_ \| 0-shot \| 49.0 \| 49.6 \| 55.2 \| 53.5 \| 53.2 \| 51.5 \| 58.6 \|
	\| mARC-Challenge _(Acc_norm)_ \| 0-shot \| 34.2 \| 36.1 \| 41.5 \| 37.2 \| 42.5 \| 37.5 \| 45.4 \|
	\| FLORES-200 En→Xx _(BLEU)_ \| 5-shot \| 23.5 \| 19.7 \| 32.1 \| 30.2 \| 25.4 \| 13.7 \| 32.3 \|
	\| FLORES-200 Xx→En _(BLEU)_ \| 5-shot \| 34.6 \| 30.3 \| 39.7 \| 37.3 \| 36.8 \| 24.1 \| 40.1 \|
	\| WMT24++ En→Xx _(BLEU)_ \| 5-shot \| 16.4 \| 17.8 \| 27.7 \| 26.1 \| 23.9 \| 7.5 \| 28.1 \|
	\| WMT24++ Xx→En _(BLEU)_ \| 5-shot \| 28.9 \| 27.4 \| 34.0 \| 32.7 \| 32.9 \| 10.6 \| 34.4 \|
	\| MGSM _(EM)_ \| 8-shot \| 22.4 \| 50.8 \| 36.6 \| 38.4 \| 76.0 \| 57.2 \| 75.6 \|
	\| Average \| - \| 34.4 \| 37.1 \| 41.4 \| 39.9 \| 48.3 \| 36.9 \| 50.9 \|

	### Multilingual — Cultural & Regional

	\| Benchmark \| # Shots \| Llama3.2-3B \| SmolLM3-3B \| Gemma3-4B \| Tiny-Aya-3.35B \| Qwen3-4B \| Trinity Mini \| Marco-Mini \|
	\|:---\|:---:\|:---:\|:---:\|:---:\|:---:\|:---:\|:---:\|:---:\|
	\| INCLUDE _(Acc)_ \| 5-shot \| 45.5 \| 46.2 \| 52.6 \| 53.9 \| 61.4 \| 51.9 \| 61.7 \|
	\| Global-PIQA _(Acc_norm)_ \| 0-shot \| 62.2 \| 60.9 \| 69.4 \| 67.9 \| 65.4 \| 57.2 \| 72.3 \|
	\| CMMLU _(Acc)_ \| 5-shot \| 44.1 \| 50.1 \| 50.2 \| 58.8 \| 76.2 \| 58.6 \| 68.0 \|
	\| C-Eval _(Acc)_ \| 5-shot \| 43.1 \| 47.9 \| 48.5 \| 57.6 \| 76.6 \| 57.1 \| 66.0 \|
	\| ArabicMMLU _(Acc)_ \| 3-shot \| 48.9 \| 60.6 \| 61.6 \| 63.2 \| 67.0 \| 57.1 \| 67.1 \|
	\| TurkishMMLU _(Acc)_ \| 5-shot \| 36.7 \| 28.4 \| 43.7 \| 45.2 \| 60.6 \| 43.0 \| 62.7 \|
	\| GreekMMLU _(Acc)_ \| 5-shot \| 56.4 \| 64.0 \| 63.4 \| 66.3 \| 69.4 \| 59.7 \| 70.3 \|
	\| KazakhMMLU _(Acc)_ \| 5-shot \| 44.7 \| 47.4 \| 52.1 \| 47.1 \| 62.3 \| 49.6 \| 62.6 \|
	\| IndoMMLU _(Acc)_ \| 0-shot \| 47.0 \| 43.7 \| 48.5 \| 52.0 \| 60.1 \| 51.0 \| 59.9 \|
	\| IndoCareer _(Acc)_ \| 3-shot \| 48.6 \| 47.7 \| 53.4 \| 56.6 \| 61.5 \| 55.2 \| 61.5 \|
	\| IndoCulture _(Acc)_ \| 0-shot \| 50.1 \| 44.5 \| 59.1 \| 58.5 \| 61.1 \| 57.6 \| 62.3 \|
	\| Average \| - \| 47.9 \| 49.2 \| 54.8 \| 57.0 \| 65.6 \| 54.4 \| 65.0 \|

	## Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "AIDC-AI/Marco-Mini-Base"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

	input_text = "The capital of France is"
	inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
	outputs = model.generate(**inputs, max_new_tokens=50)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	## Citation

	```bibtex
	@article{marco-moe,
	title={Marco-MoE: Open Multilingual Mixture-of-Expert Language Models with Efficient Upcycling},
	author={Fan Jiang, Yu Zhao, Chenyang Lyu, Tianqi Shi, Yichao Du, Feihu Jiang, Longyue Wang and Weihua Luo},
	year={2026}
	}
	```

	## License

	This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).