Switch Transformers โ€” Dialogue Summarization

A Switch Transformer (Mixture-of-Experts T5) model fine-tuned for abstractive text summarization. The model uses sparse expert routing to scale model capacity without a proportional increase in compute per token.

Model Description

Switch Transformers replace the dense feed-forward sublayers in standard T5 with Mixture-of-Experts (MoE) layers. Each token is routed to one of num_experts=8 expert feed-forward networks by a learned routing function, allowing the model to specialize different experts for different types of input. This repo contains a fine-tuned variant configured for summarization.

Parameter Value
Architecture SwitchTransformersForConditionalGeneration
Number of experts 8
Task Conditional text generation / summarization
Format Safetensors

How to Use

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("YashNagraj75/SwitchTransformers-Summarization/switch-transformer-tokenizer")
model = AutoModelForSeq2SeqLM.from_pretrained("YashNagraj75/SwitchTransformers-Summarization/switch-transformer")

text = "Your long document or conversation here..."
inputs = tokenizer("summarize: " + text, return_tensors="pt", truncation=True)
outputs = model.generate(
    **inputs,
    max_length=200,
    min_length=30,
    num_beams=4,
    length_penalty=2.0,
    no_repeat_ngram_size=3,
    early_stopping=True,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Generation Config

Parameter Value
Prefix summarize:
Max length 200
Min length 30
Num beams 4
Length penalty 2.0
No repeat ngram size 3
Early stopping True

Repository Contents

Path Description
switch-T5.ipynb Training and evaluation notebook
switch-transformer/ Model weights (safetensors) + config
switch-transformer-tokenizer/ Tokenizer files

License

MIT

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support