Instructions to use google/flan-t5-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/flan-t5-base with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-base") model = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-base") - Notebooks
- Google Colab
- Kaggle
New architecture: TemporalMesh Transformer — dynamic kNN graph attention + per-token exit routing, 29.4 PPL at 48% compute
#42
by vigneshwar234 - opened
TemporalMesh Transformer (TMT) — open-source, 120M params, state-of-the-art efficiency
TMT achieves 29.4 PPL on WikiText-2 (−30.2% vs vanilla) at 48% relative compute — outperforming Mamba (31.8), RWKV (33.1), and vanilla transformer (42.1) at ~120M parameters.
5 innovations in one forward pass: Mesh Attention (dynamic kNN graph, O(S·k)), Temporal Decay Encoding (learned multiplicative post-softmax), Adaptive Depth Routing (per-token exit gate, 52% compute saved), Dual-Stream FFN, EMA Memory Anchors.
| WT-2 PPL↓ | LongBench↑ | C4 PPL↓ | Compute | |
|---|---|---|---|---|
| Vanilla | 42.1 | 41.2 | 38.4 | 100% |
| Mamba | 31.8 | 51.3 | 30.1 | 55% |
| TMT | 29.4 | 53.4 | 27.4 | 48% |
📄 https://zenodo.org/records/20287390 · 💻 https://github.com/vignesh2027/TemporalMesh-Transformer · 🎮 https://huggingface.co/spaces/vigneshwar234/TemporalMesh-Transformer-Demo