IlyaGusev/gazeta
Viewer • Updated • 74.1k • 1.95k • 28
How to use d0rj/ru-mbart-large-summ with Transformers:
# Use a pipeline as a high-level helper
# Warning: Pipeline type "summarization" is no longer supported in transformers v5.
# You must load the model directly (see below) or downgrade to v4.x with:
# 'pip install "transformers<5.0.0'
from transformers import pipeline
pipe = pipeline("summarization", model="d0rj/ru-mbart-large-summ") # Load model directly
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("d0rj/ru-mbart-large-summ")
model = AutoModelForSeq2SeqLM.from_pretrained("d0rj/ru-mbart-large-summ")Model forked from ru-bart-large which is smaller version of the facebook/mbart-large-50 with only Russian and English embeddings.
All 'train' subsets was concatenated and shuffled with seed 1000 - 7.
Train subset = 155678 rows.
Evaluation on 10% of concatenated 'validation' subsets = 1458 rows.
See WandB logs.
See report at REPORT WIP.
from transformers import pipeline
pipe = pipeline('summarization', model='d0rj/ru-mbart-large-summ')
pipe(text)
import torch
from transformers import AutoTokenizer, MBartModel
tokenizer = AutoTokenizer.from_pretrained('d0rj/ru-mbart-large-summ')
model = MBartModel.from_pretrained('d0rj/ru-mbart-large-summ')
inputs = tokenizer('Всё в порядке, мимо двигал Утром прозвенел будильник', return_tensors='pt')
with torch.no_grad():
outputs = model(**inputs)
last_hidden_states = outputs.last_hidden_state