ASTRAI Pluto Nano 0.5

Mixture-of-Experts language model — 1 B total / 50 M active per token (47 M exact).

Developed from scratch by ASTRAI Labs. v0.5 is a public preview release of the upcoming Pluto Nano 1.0.

⚠️ READ FIRST — Experimental preview, not a working chatbot. This model does not chat fluently. It produces grammatical English most of the time but goes off-topic, loops, or generates math-style "Question: ..." chains regardless of the input. Multilingual quality is worse — non-EN outputs often collapse into repetition or garbage characters. This was on purpose — v0.5 was built as a public experiment to validate the ASTRAI Pluto architecture (custom MoE, 5-language multilingual support, RTX 3060 trainability), not as a usable assistant. It's published so the community can poke at the architecture, reproduce the pipeline, and watch the upcoming v1.0 (10× pretrain, 10 languages, 128 k vocab, Qwen2.5-1.5B warm-start) inherit from it. Don't deploy v0.5 anywhere — it'll embarrass everyone involved.

Architecture

Spec Value
Total parameters 1 B
Active per token ~50 M (47 M exact)
Experts 35 (top-1 routing)
Attention GQA — 6 query heads, 2 KV heads
Hidden / Layers 384 / 16
Expert intermediate 1536
Tokenizer Custom 32 k BPE
Max context 4096
RoPE θ 1e6
MTP depth (pretrain only) 2
Languages EN, PT, ES, ZH, HI

Training

  • Pretrain: 13 B tokens of curated multilingual text
  • Post-training: SFT, ORPO, DPO, KTO + distillation from frontier models (Claude Opus 4.7/4.8, GPT-5.5, Gemini 3.x, Qwen3-235B, Grok 4.4, etc.)
  • Hardware: single consumer RTX 3060 (12 GB VRAM)
  • Training time for v0.5: ~2 weeks

Benchmarks

Honest reporting — all 11/12 standard small-LM benchmarks below (PIQA not measured in our harness, hence 11/12). Pluto v0.5 is a public preview and loses on several knowledge-heavy tasks; it wasn't cherry-picked.

Full results vs SupraLabs/Supra-50M-Reasoning

Benchmark Pluto Nano v0.5 Supra-50M target Result
HellaSwag 30.00 29.16 win
Winogrande 55.00 51.07 win
BoolQ 61.67 46.06 win
MMLU 27.00 23.58 win
WikiText PPL 201.91 166.27 ❌ (lower is better)
Lambada 0.00 16.53 ❌ (format shift, known limitation)
COPA 43.00 59.00
ARC-Easy 21.00 45.16
ARC-Challenge 23.75 26.54
OpenBookQA 26.00 28.80
SciQ 58.00 64.10

4 wins / 11 benches vs Supra-50M.

Comparison vs other small models

comparison

Bench Pluto 0.5 Supra-50M SmolLM-135M SmolLM2-135M GPT-X-125M GPT-X2-125M
HellaSwag 30.00 29.22 42.70 42.10 36.57 40.55
Winogrande 55.00 51.54 50.43 51.30 50.83 49.01
BoolQ 61.67 42.05 N/A N/A N/A N/A
ARC (avg) 22.38 35.90 43.17 43.90 38.84 39.90
OpenBookQA 26.00 28.60 34.00 34.60 N/A N/A
MMLU 27.00 23.58 30.20 31.50 N/A N/A
SciQ 58.00 64.10 N/A N/A N/A N/A

Reading: Pluto is competitive on chat/reasoning benches (Winogrande, BoolQ) but lags on knowledge-dense ones (ARC, OpenBookQA, MMLU vs 135M models). Expected trade-off for a 50M-active MoE trained on consumer hardware with 13B-token pretrain across 5 languages.

Usage

import sys, torch
# Pluto uses a custom architecture; load_pluto handles config + weights
from astraimoe.pluto_arch import load_pluto
from transformers import PreTrainedTokenizerFast

tok = PreTrainedTokenizerFast(
    tokenizer_file="tokenizer.json",
    unk_token="<|unk|>", pad_token="<|pad|>",
    bos_token="<|bos|>", eos_token="<|eos|>",
)
model = load_pluto(".", dtype=torch.bfloat16).cuda()
model.eval()

prompt = "<|lang_en|>\n<|user|>\nWhat is a Mixture of Experts?\n<|im_end|>\n<|assistant|>\n"
ids = tok(prompt, return_tensors="pt", add_special_tokens=False).input_ids.cuda()

# Greedy generation loop
with torch.no_grad():
    cur = ids[0].tolist()
    for _ in range(200):
        inp = torch.tensor([cur[-4096:]], device=ids.device, dtype=torch.long)
        logits = model(input_ids=inp)["logits"][0, -1]
        nxt = int(logits.argmax())
        if nxt in (tok.eos_token_id, tok.convert_tokens_to_ids("<|im_end|>")): break
        cur.append(nxt)
print(tok.decode(cur[ids.size(1):], skip_special_tokens=True))

Chat template

<|lang_{en|pt|es|zh|hi}|>
<|user|>
...question...
<|im_end|>
<|assistant|>
...response...
<|im_end|>

Limitations (please read carefully)

  • Chat quality is poor. The model produces grammatical English but is off-topic, often spirals into "Question: What is the value of X?" math-style loops, or random encyclopedia-style sentences regardless of the prompt. This is the actual ceiling of v0.5 — not a bug in your inference code.
  • Multilingual is worse than English. PT/ES/ZH/HI prompts may produce language-correct outputs but with repetition loops or garbage characters. The pretrain budget (13 B tokens) is too thin for 5 languages.
  • No identity — the model doesn't know its own name. Multiple identity SFT attempts hurt other benchmarks too much, so the released checkpoint skips them. Ask "who are you?" and the model invents something.
  • Code: deliberately not trained. At 47 M active params there's no room for code knowledge; we filtered it out of every training stage to save capacity.
  • Pretrain corpus: 13 B tokens — small vs frontier 1 T+ models. This is the fundamental limitation. v1.0 adds 10 B more curated pretrain on top.

If you want to chat with a small LM today, use SmolLM2-135M or Qwen2.5-0.5B. This release exists to validate the architecture and the training pipeline, not to compete on usability.

GGUF / quantizations

Quantized GGUF builds (fp16 / Q8_0 / Q6_K / Q4_K_M) are available at ASTRAI-labs/pluto-nano-0.5-gguf.

They use a qwen2_moe-spoofing shim because astrai_pluto isn't yet a native arch in llama.cpp. Note: the shim currently produces garbled output due to top-1 MoE routing incompatibility — use the bf16 safetensors here via the inference script above for now. Proper llama.cpp support is on the roadmap.

To reproduce, see GGUF.md and convert_pluto_to_qwen2moe.py in this repo.

License

ASTRAI Closed License — weights are made available for research and evaluation. Commercial use requires explicit agreement with ASTRAI Labs.

Citation

@misc{astrai_pluto_nano_2026,
  author = {ShinMK (Miguel) and ASTRAI Labs},
  title  = {ASTRAI Pluto Nano 0.5},
  year   = {2026},
  url    = {https://huggingface.co/ASTRAI-labs/pluto-nano-0.5},
}

Contact

ASTRAI Labs — founder: ShinMK (Miguel).

Downloads last month
60
Safetensors
Model size
1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ASTRAI-labs/pluto-nano-0.5

Finetuned
(1)
this model
Quantizations
1 model