Model Card for T5-base + LoRA (PEFT) for Seq2Seq
Model Summary
Backbone: t5-base (encoder–decoder Transformer)
Parameter-efficient fine-tuning: LoRA adapters via peft.PeftModelForSeq2SeqLM
Where LoRA is applied: Query, Key, Value projections in:
Encoder self-attention (all 12 blocks)
Decoder self-attention (all 12 blocks)
Decoder cross-attention (all 12 blocks)
LoRA config: rank = 16, alpha = 32, dropout = 0.1
Tokenizer vocab size: 32128
Hidden size: 768
FFN size: 3072 (DenseReluDense)
Relative position buckets: 32
Heads: 12
Norm: T5LayerNorm
Activation: ReLU in FFN
LM head: tied to embeddings, out_features = 32128
This model keeps the full T5-base frozen and learns small low-rank adapter weights. You get near full fine-tuning quality at a fraction of the compute and VRAM.
- Downloads last month
- -
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support