AETHER-166B-A8B (Phase 0.5 โ€” Oheng Calibrated)

166B Mixture-of-Experts model with Wu Xing (Five Elements) oheng architecture.

Model Details

  • Parameters: 166B total, ~8B active per token
  • Architecture: 25 layers, 5x5 MoE per layer, GDN+Mamba2+FullAttn hybrid
  • Parent: Qwen3.5-397B-A17B (weight transplant + oheng calibration)
  • Oheng: GenerateBoost (alpha[5]) + OvercomeGate
  • Calibration: v11, 3000 steps, sparse KLD, best_kld=14.28 @ step 1489
  • Precision: bfloat16

Training

  • Phase 0: Weight transplant from Qwen3.5-397B โ†’ AETHER-166B
  • Phase 0.5: Oheng calibration (offline sparse KLD distillation, 0.41% params)
  • Phase 1: Full distillation (planned)

Status

This is a Phase 0.5 checkpoint. The model is NOT yet suitable for inference. Full distillation (Phase 1) is required for usable generation quality.

Hardware

Trained on 8x NVIDIA H200 (143GB each) with pipeline parallel.

Downloads last month
86
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Be2Jay/AETHER-166B-A8B

Finetuned
(25)
this model