Switch Transformers โ Dialogue Summarization
A Switch Transformer (Mixture-of-Experts T5) model fine-tuned for abstractive text summarization. The model uses sparse expert routing to scale model capacity without a proportional increase in compute per token.
Model Description
Switch Transformers replace the dense feed-forward sublayers in standard T5 with Mixture-of-Experts (MoE) layers. Each token is routed to one of num_experts=8 expert feed-forward networks by a learned routing function, allowing the model to specialize different experts for different types of input. This repo contains a fine-tuned variant configured for summarization.
| Parameter | Value |
|---|---|
| Architecture | SwitchTransformersForConditionalGeneration |
| Number of experts | 8 |
| Task | Conditional text generation / summarization |
| Format | Safetensors |
How to Use
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("YashNagraj75/SwitchTransformers-Summarization/switch-transformer-tokenizer")
model = AutoModelForSeq2SeqLM.from_pretrained("YashNagraj75/SwitchTransformers-Summarization/switch-transformer")
text = "Your long document or conversation here..."
inputs = tokenizer("summarize: " + text, return_tensors="pt", truncation=True)
outputs = model.generate(
**inputs,
max_length=200,
min_length=30,
num_beams=4,
length_penalty=2.0,
no_repeat_ngram_size=3,
early_stopping=True,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Generation Config
| Parameter | Value |
|---|---|
| Prefix | summarize: |
| Max length | 200 |
| Min length | 30 |
| Num beams | 4 |
| Length penalty | 2.0 |
| No repeat ngram size | 3 |
| Early stopping | True |
Repository Contents
| Path | Description |
|---|---|
| switch-T5.ipynb | Training and evaluation notebook |
| switch-transformer/ | Model weights (safetensors) + config |
| switch-transformer-tokenizer/ | Tokenizer files |
License
MIT