qwen3.5-moe-cpt

Qwen/Qwen3.5-2B-Base tabanından uretilmis Qwen3.5 text-only MoE CPT checkpoint.

Mimari

  • Expert sayisi: 4
  • Top-k routing: top-2 / 4 expert aktif/token
  • Dispatch: XLA-safe cumsum token dispatch, token dropping aktif
  • Sequence length: 2048
  • Router aux coef: 0.01
  • Router z-loss coef: 0.001

Egitim

  • Amac: continual pretraining
  • Toplam token hedefi: 3,390,478,336
  • Router warmup: True
  • Router warmup token: 200,000,000

Yukleme

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

repo_id = "Efe2898/qwen3.5-moe-cpt"
tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    repo_id,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
)
Downloads last month
24
Safetensors
Model size
5B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Efe2898/qwen3.5-moe-cpt-try

Finetuned
(26)
this model