Instructions to use AIDC-AI/Marco-Nano-Base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use AIDC-AI/Marco-Nano-Base with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="AIDC-AI/Marco-Nano-Base") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("AIDC-AI/Marco-Nano-Base") model = AutoModelForCausalLM.from_pretrained("AIDC-AI/Marco-Nano-Base") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use AIDC-AI/Marco-Nano-Base with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "AIDC-AI/Marco-Nano-Base" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AIDC-AI/Marco-Nano-Base", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/AIDC-AI/Marco-Nano-Base
- SGLang
How to use AIDC-AI/Marco-Nano-Base with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "AIDC-AI/Marco-Nano-Base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AIDC-AI/Marco-Nano-Base", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "AIDC-AI/Marco-Nano-Base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AIDC-AI/Marco-Nano-Base", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use AIDC-AI/Marco-Nano-Base with Docker Model Runner:
docker model run hf.co/AIDC-AI/Marco-Nano-Base
Marco-Nano-Base
Marco-Nano-Base is a compact, highly sparse Mixture-of-Experts (MoE) multilingual language model from the Marco-MoE family, developed by Alibaba International Digital Commerce. It activates only 0.6B out of 8B total parameters (7.5% activation ratio) per token, achieving strong English and multilingual performance across 29 languages while requiring significantly less compute than comparable dense models.
Model Description
Marco-Nano is built on a decoder-only Transformer architecture with sparse MoE layers replacing standard FFN layers. It is upcycled from Qwen3-0.6B-Base using a fine-grained sub-matrix splitting strategy combined with Drop-Upcycling to promote expert diversification.
| Configuration | Value |
|---|---|
| Total Parameters | 8B |
| Activated Parameters | 0.6B |
| Activation Ratio | 7.5% |
| Num Layers | 28 |
| Model Dimension | 1024 |
| FFN Intermediate Dimension | 3072 |
| Q-Heads | 16 |
| KV-Heads | 8 |
| Head Dimension | 128 |
| Expert Dimension | 384 |
| Total Experts | 232 |
| Activated Experts | 8 |
| Tie Embeddings | True |
| Training FLOPs | $1.40 \times 10^{23}$ |
Training Details
Marco-Nano was pre-trained on 5.1 trillion tokens using a four-stage curriculum:
- Stage 1 (0 - 2.4T tokens): Foundational Training — High-quality English data (Nemotron-CC-v2), reasoning and instruction data, and multilingual web/QA data for 19 languages.
- Stage 2 (2.4T - 4.1T tokens): Optimization & Upsampling — Upsampled reasoning corpora, downsampled English web data, and upsampled Chinese data with learning rate decay.
- Stage 3 (4.1T - 4.6T tokens): Language Expansion — Added 9 new languages (Bengali, Czech, Urdu, Kazakh, Greek, Romanian, Hungarian, Nepali, Azerbaijani) and upsampled medium-resource languages.
- Stage 4 (4.6T - 5.1T tokens): Synthetic Data Integration — Curated multilingual synthetic data including cultural content (Fineweb2-Culture) and synthetic regional MCQs.
Supported Languages
English, Chinese, Arabic, German, Spanish, French, Korean, Japanese, Portuguese, Turkish, Indonesian, Italian, Dutch, Polish, Russian, Vietnamese, Thai, Hebrew, Ukrainian, Malay, Bengali, Czech, Urdu, Kazakh, Greek, Romanian, Hungarian, Nepali, Azerbaijani
Evaluation
We compare Marco-Nano against size-matched baselines: Qwen3-1.7B (1.7B activated), Trinity Nano (1.09B activated), and Granite4-Tiny (1.47B activated). Marco-Nano uses only 0.6B activated parameters — the smallest among all baselines.
English
| Benchmark | # Shots | Qwen3-1.7B | Trinity Nano | Granite4-Tiny | Marco-Nano |
|---|---|---|---|---|---|
| MMLU (Acc) | 5-shot | 65.1 | 64.7 | 69.1 | 64.7 |
| MMLU-Redux (Acc) | 0-shot | 61.2 | 60.1 | 65.8 | 62.9 |
| MMLU-Pro (Acc) | 5-shot | 33.2 | 32.0 | 32.1 | 35.9 |
| AGIEval (Acc) | 0-shot | 35.9 | 31.4 | 36.1 | 38.4 |
| BBH (EM) | 3-shot | 54.5 | 49.3 | 59.9 | 53.5 |
| ARC-Easy (Acc) | 0-shot | 69.3 | 77.9 | 78.5 | 75.3 |
| ARC-Challenge (Acc) | 0-shot | 42.8 | 53.5 | 52.3 | 49.4 |
| HellaSwag (Acc) | 0-shot | 66.6 | 77.4 | 77.9 | 69.2 |
| WinoGrande (Acc) | 0-shot | 57.1 | 57.1 | 58.6 | 53.4 |
| BoolQ (Acc) | 0-shot | 74.6 | 71.5 | 63.5 | 71.2 |
| CommonsenseQA (Acc) | 0-shot | 49.5 | 54.1 | 55.9 | 55.7 |
| OpenBookQA (Acc) | 0-shot | 36.4 | 42.0 | 43.6 | 39.4 |
| PIQA (Acc) | 0-shot | 75.5 | 69.6 | 80.6 | 76.5 |
| SIQA (Acc) | 0-shot | 47.8 | 52.7 | 53.0 | 46.0 |
| GSM8K (EM) | 5-shot | 69.1 | 57.8 | 70.7 | 69.7 |
| Average | - | 55.9 | 56.7 | 59.8 | 57.5 |
Multilingual — General
| Benchmark | # Shots | Qwen3-1.7B | Trinity Nano | Granite4-Tiny | Marco-Nano |
|---|---|---|---|---|---|
| GlobalMMLU (Acc) | 5-shot | 49.6 | 43.6 | 54.8 | 52.2 |
| MMMLU (Acc) | 0-shot | 48.6 | 41.2 | 52.3 | 52.6 |
| MMLU-ProX-Lite (Acc) | 5-shot | 27.2 | 20.3 | 30.1 | 28.9 |
| BELEBELE (Acc) | 0-shot | 67.5 | 54.5 | 61.2 | 73.8 |
| mHellaSwag (Acc_norm) | 0-shot | 43.9 | 42.5 | 53.2 | 48.8 |
| mARC-Challenge (Acc_norm) | 0-shot | 34.7 | 30.9 | 39.9 | 36.9 |
| FLORES-200 En→Xx (BLEU) | 5-shot | 18.6 | 15.1 | 25.4 | 24.7 |
| FLORES-200 Xx→En (BLEU) | 5-shot | 31.5 | 31.1 | 36.7 | 33.6 |
| WMT24++ En→Xx (BLEU) | 5-shot | 18.3 | 15.0 | 21.9 | 20.7 |
| WMT24++ Xx→En (BLEU) | 5-shot | 28.3 | 28.0 | 30.7 | 28.1 |
| MGSM (EM) | 8-shot | 58.8 | 40.6 | 56.7 | 65.3 |
| Average | - | 38.8 | 33.0 | 42.1 | 42.3 |
Multilingual — Cultural & Regional
| Benchmark | # Shots | Qwen3-1.7B | Trinity Nano | Granite4-Tiny | Marco-Nano |
|---|---|---|---|---|---|
| INCLUDE (Acc) | 5-shot | 51.2 | 43.9 | 52.1 | 53.2 |
| Global-PIQA (Acc_norm) | 0-shot | 60.3 | 52.3 | 64.0 | 64.3 |
| CMMLU (Acc) | 5-shot | 66.1 | 49.6 | 53.5 | 55.5 |
| C-Eval (Acc) | 5-shot | 65.1 | 47.6 | 50.9 | 56.0 |
| ArabicMMLU (Acc) | 3-shot | 57.6 | 44.0 | 60.5 | 55.8 |
| TurkishMMLU (Acc) | 5-shot | 47.9 | 29.6 | 41.8 | 48.9 |
| GreekMMLU (Acc) | 5-shot | 58.1 | 52.2 | 62.3 | 64.1 |
| KazakhMMLU (Acc) | 5-shot | 52.1 | 43.1 | 52.6 | 53.1 |
| IndoMMLU (Acc) | 0-shot | 51.0 | 41.5 | 49.0 | 51.0 |
| IndoCareer (Acc) | 3-shot | 53.9 | 46.7 | 53.0 | 52.1 |
| IndoCulture (Acc) | 0-shot | 51.6 | 49.8 | 51.3 | 57.4 |
| Average | - | 55.9 | 45.5 | 53.7 | 55.6 |
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "AIDC-AI/Marco-Nano-Base"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
input_text = "The capital of France is"
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Citation
@article{marco-moe,
title={Marco-MoE: Open Multilingual Mixture-of-Expert Language Models with Efficient Upcycling},
author={Fan Jiang, Yu Zhao, Chenyang Lyu, Tianqi Shi, Yichao Du, Feihu Jiang, Longyue Wang and Weihua Luo},
year={2026}
}
License
This model is released under the Apache 2.0 License.
- Downloads last month
- 154