YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Week 2 MoE Seq2Seq (hash routing)

  • Best validation loss: 5.6068
  • Top-k: 1
  • Aux loss coef: 0.0

Artifacts include the trained state dict (model.pt), metrics (metrics.json), per-epoch history (history.csv), and tokenizer files.

Architecture

  • Encoder-Decoder Transformer with Sparse MoE layers
  • Hash-based routing (deterministic) or Token-choice top-k routing (learned)
  • Load balancing auxiliary loss for top-k routing
  • Trained from scratch on XSum for abstractive summarization
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support