Model Card for T5-base + LoRA (PEFT) for Seq2Seq

Model Summary

Backbone: t5-base (encoder–decoder Transformer)

Parameter-efficient fine-tuning: LoRA adapters via peft.PeftModelForSeq2SeqLM

Where LoRA is applied: Query, Key, Value projections in:

Encoder self-attention (all 12 blocks)

Decoder self-attention (all 12 blocks)

Decoder cross-attention (all 12 blocks)

LoRA config: rank = 16, alpha = 32, dropout = 0.1

Tokenizer vocab size: 32128

Hidden size: 768

FFN size: 3072 (DenseReluDense)

Relative position buckets: 32

Heads: 12

Norm: T5LayerNorm

Activation: ReLU in FFN

LM head: tied to embeddings, out_features = 32128

This model keeps the full T5-base frozen and learns small low-rank adapter weights. You get near full fine-tuning quality at a fraction of the compute and VRAM.

Downloads last month
-
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support