LSG Attention: Extrapolation of pretrained Transformers to long sequences
Paper • 2210.15497 • Published • 1
How to use dlicari/lsg16k-Italian-Legal-BERT with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("fill-mask", model="dlicari/lsg16k-Italian-Legal-BERT", trust_remote_code=True) # Load model directly
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("dlicari/lsg16k-Italian-Legal-BERT", trust_remote_code=True)
model = AutoModelForMaskedLM.from_pretrained("dlicari/lsg16k-Italian-Legal-BERT", trust_remote_code=True)
Local-Sparse-Global version of ITALIAN-LEGAL-BERT by replacing the full attention in the encoder part using the LSG converter script (https://github.com/ccdv-ai/convert\_checkpoint\_to\_lsg). We used the LSG attention with 16,384 maximum sequence length, 7 global tokens, 128 local block size, 128 sparse block size, 2 sparsity factors, 'norm' sparse selection pattern (select the highest norm tokens).