Pacific-i64
/

TR-MoE-954

complexity-deep

Model card Files Files and versions

COMPLEXITY-DEEP Token-Routed MoE (187M) — Training Checkpoint (Step 954)

Resumable training checkpoint with full optimizer state. Use this to continue training from step 954 (500M tokens).

Contents

checkpoint.pt - Model weights + training state
optimizer_rank0.pt - AdamW optimizer state (GPU 0)
optimizer_rank1.pt - AdamW optimizer state (GPU 1)
training_state.json - Step counter, LR, etc.

Model Config

Parameters: 187M
Hidden: 768, Layers: 18, Heads: 12, KV Heads: 4
Experts: 4, Intermediate: 2048 (512/expert), Shared: 512
Training: 500M tokens, AdamW lr=3e-4 (auto-scaled 6e-4), cosine 5% warmup

Resume Training

import torch

checkpoint = torch.load("checkpoint.pt", map_location="cpu")
model.load_state_dict(checkpoint["model"])

# Load optimizer for your GPU rank (0 or 1)
rank = torch.distributed.get_rank()
optimizer_state = torch.load(f"optimizer_rank{rank}.pt", map_location="cpu")
optimizer.load_state_dict(optimizer_state)

# Resume from step 954

Pretrained Weights (inference)

For inference use the safetensors checkpoint in ../final/ instead.

License

CC-BY-NC-4.0

Complexity-ML -- 2026

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support