qwen3.5-moe-cpt
Qwen/Qwen3.5-2B-Base tabanından uretilmis Qwen3.5 text-only MoE CPT checkpoint.
Mimari
- Expert sayisi: 4
- Top-k routing: top-2 / 4 expert aktif/token
- Dispatch: XLA-safe cumsum token dispatch, token dropping aktif
- Sequence length: 2048
- Router aux coef: 0.01
- Router z-loss coef: 0.001
Egitim
- Amac: continual pretraining
- Toplam token hedefi: 3,390,478,336
- Router warmup: True
- Router warmup token: 200,000,000
Yukleme
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
repo_id = "Efe2898/qwen3.5-moe-cpt"
tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
repo_id,
trust_remote_code=True,
torch_dtype=torch.bfloat16,
)
- Downloads last month
- 24
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for Efe2898/qwen3.5-moe-cpt-try
Base model
Qwen/Qwen3.5-2B-Base