Switch Transformers — Dialogue Summarization

A Switch Transformer (Mixture-of-Experts T5) model fine-tuned for abstractive text summarization. The model uses sparse expert routing to scale model capacity without a proportional increase in compute per token.

Model Description

Switch Transformers replace the dense feed-forward sublayers in standard T5 with Mixture-of-Experts (MoE) layers. Each token is routed to one of num_experts=8 expert feed-forward networks by a learned routing function, allowing the model to specialize different experts for different types of input. This repo contains a fine-tuned variant configured for summarization.

Parameter	Value
Architecture	SwitchTransformersForConditionalGeneration
Number of experts	8
Task	Conditional text generation / summarization
Format	Safetensors

How to Use

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("YashNagraj75/SwitchTransformers-Summarization/switch-transformer-tokenizer")
model = AutoModelForSeq2SeqLM.from_pretrained("YashNagraj75/SwitchTransformers-Summarization/switch-transformer")

text = "Your long document or conversation here..."
inputs = tokenizer("summarize: " + text, return_tensors="pt", truncation=True)
outputs = model.generate(
    **inputs,
    max_length=200,
    min_length=30,
    num_beams=4,
    length_penalty=2.0,
    no_repeat_ngram_size=3,
    early_stopping=True,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Generation Config

Parameter	Value
Prefix	`summarize:`
Max length	200
Min length	30
Num beams	4
Length penalty	2.0
No repeat ngram size	3
Early stopping	True

Repository Contents

Path	Description
switch-T5.ipynb	Training and evaluation notebook
switch-transformer/	Model weights (safetensors) + config
switch-transformer-tokenizer/	Tokenizer files

License

MIT

Downloads last month: -; Downloads are not tracked for this model. How to track