t5-base-xsum-lora

Model Description

T5 with MoE (Token Choice Routing) fine-tuned on the XSUM dataset for abstractive summarization.

Architecture

This model uses Sparse Mixture of Experts with learned Token Choice Top-k routing.

Key Features:

Learned gating network for expert selection
Top-k routing (each token routed to 2 experts)
Optional load balancing loss

Training Data

The model was trained on the XSUM dataset, which contains:

~204k training examples
~11k validation examples
~11k test examples

Each example consists of a BBC news article and a one-sentence summary.

Usage

from transformers import T5Tokenizer

# Load tokenizer
tokenizer = T5Tokenizer.from_pretrained("YOUR_USERNAME/t5-base-xsum-lora")

# Note: For MoE models, you need to reconstruct the architecture
# See the model repository for detailed loading instructions

Evaluation

Evaluate using standard ROUGE metrics and SummaC consistency scores.

Training Procedure

The model was trained using:

AdamW optimizer with weight decay
Learning rate: 5e-5
Warmup steps: 500
Mixed precision (FP16) training
Gradient accumulation for larger effective batch size

Limitations

Trained only on English news articles
May not generalize well to other domains
MoE models require custom loading code

Citation

If you use this model, please cite the XSUM dataset:

@inproceedings{narayan-etal-2018-dont,
    title = "Don{'}t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization",
    author = "Narayan, Shashi and Cohen, Shay B. and Lapata, Mirella",
    booktitle = "Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing",
    year = "2018",
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Syd-J
/

t5-base-xsum-lora