Aggregate performance data based on the ROUGE (Recall-Oriented Understudy for Gisting Evaluation) scoring system.
Measures the overlap of unigrams (single words) between the generated summary and the reference text. High scores indicate good content coverage.
Measures the overlap of bigrams (pairs of consecutive words). This is a strong indicator of fluency and phrasing quality.
Based on the Longest Common Subsequence. It captures sentence structure and sequential flow more effectively than simple n-gram overlap.
"BART and PEGASUS typically outperform TextRank in ROUGE-2 and ROUGE-L as they generate fluent, abstractive prose rather than just extracting source fragments."