asas-ai/Arabic-article-summarization
Viewer • Updated • 6.7k • 23
How to use karimraouf/mbart-arabic-summarizer with Transformers:
# Use a pipeline as a high-level helper
# Warning: Pipeline type "summarization" is no longer supported in transformers v5.
# You must load the model directly (see below) or downgrade to v4.x with:
# 'pip install "transformers<5.0.0'
from transformers import pipeline
pipe = pipeline("summarization", model="karimraouf/mbart-arabic-summarizer") # Load model directly
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("karimraouf/mbart-arabic-summarizer")
model = AutoModelForSeq2SeqLM.from_pretrained("karimraouf/mbart-arabic-summarizer")This model is a fine-tuned version of facebook/mbart-large-50 on asas-ai/Arabic-article-summarization dataset.
facebook/mbart-large-50fp16)Only Fine Tuned for 3 epochs , and the data is not in very good quality, better to take and fine tune more on better data quality depending on the use case.
Evaluation Metric: ROUGE (1/2/L/Lsum) trained and evaluated on the [asas-ai/Arabic-article-summarization] dataset which i splitted into training and testing splts
The following hyperparameters were used during training:
After 3 Epochs of Training:
First Approach
```python
from transformers import MBartForConditionalGeneration, MBartTokenizer
# Load model and tokenizer from Hugging Face Hub
model_name = "karimraouf/mbart-arabic-summarizer"
tokenizer = MBartTokenizer.from_pretrained(model_name)
model = MBartForConditionalGeneration.from_pretrained(model_name)
# Input text (replace with your own)
input_text = "أي نص تريده"
# Tokenize the input
inputs = tokenizer(
input_text,
return_tensors="pt",
max_length=1024,
truncation=True,
padding=True
)
# Generate the summary
summary_ids = model.generate(
**inputs,
max_length=200,
num_beams=3,
early_stopping=True
)
# Decode and print the summary
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
print("Summary:", summary)
Second Approach
```python
from transformers import pipeline
# Load the summarization pipeline with the pre-trained Arabic MBart model
summarizer = pipeline(
"summarization",
model="karimraouf/mbart-arabic-summarizer",
tokenizer="karimraouf/mbart-arabic-summarizer"
)
# Replace this with your own Arabic text
input_text = "أي نص تريده"
# Generate the summary
summary = summarizer(
input_text,
max_length=200,
min_length=30,
do_sample=False
)
# Print the result
print(summary[0]['summary_text'])
Base model
facebook/mbart-large-50