YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Week 2 MoE Seq2Seq (hash routing)
- Best validation loss: 5.6068
- Top-k: 1
- Aux loss coef: 0.0
Artifacts include the trained state dict (model.pt), metrics (metrics.json), per-epoch history (history.csv), and tokenizer files.
Architecture
- Encoder-Decoder Transformer with Sparse MoE layers
- Hash-based routing (deterministic) or Token-choice top-k routing (learned)
- Load balancing auxiliary loss for top-k routing
- Trained from scratch on XSum for abstractive summarization
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support