mokamoto commited on
Commit
8a8f34e
·
1 Parent(s): 66f69ed

dataset descriptions

Browse files
causal_analysis_table.html CHANGED
@@ -15,8 +15,8 @@
15
  <thead>
16
  <tr>
17
  <th rowspan="2">Model</th>
18
- <th colspan="4" class="has-text-centered tooltip-trigger" data-tooltip="For text sections identified as causal, this task extracts the Cause and Effect spans, handling both unicausal and multicausal cases in financial texts.">Causal Detection (CD)</th>
19
- <th colspan="4" class="has-text-centered tooltip-trigger tooltip-right" style="position: relative;" data-title="Causal Classification" data-tooltip="Determines if a given financial text section contains a causal relation, labeled as 1 if causal and 0 otherwise.">Causal Classification (CC)</th>
20
  </tr>
21
  <tr>
22
  <th class="has-text-centered">Accuracy</th>
 
15
  <thead>
16
  <tr>
17
  <th rowspan="2">Model</th>
18
+ <th colspan="4" class="has-text-centered tooltip-trigger" data-title="FinCausal (CD)" data-tooltip="FinCausal Causal Discovery (CD) contains 29,444 text sections from financial news, with 2,136 annotated as expressing causal relationships. The task involves extracting precise cause and effect spans from financial texts that contain causal relationships.">FinCausal (CD)</th>
19
+ <th colspan="4" class="has-text-centered tooltip-trigger tooltip-right" style="position: relative;" data-title="FinCausal (CC)" data-tooltip="FinCausal Causality Classification (CC) consists of 29,444 text sections from financial news with binary annotations indicating causal relationships. The classification task requires determining whether a given financial text section contains a causal relationship (1) or not (0).">FinCausal (CC)</th>
20
  </tr>
21
  <tr>
22
  <th class="has-text-centered">Accuracy</th>
information_retrieval_table.html CHANGED
@@ -15,11 +15,11 @@
15
  <thead>
16
  <tr>
17
  <th rowspan="2">Model</th>
18
- <th colspan="4" class="has-text-centered tooltip-trigger" data-tooltip="A manually annotated dataset of 47,851 financial news articles with named entity annotations for person (PER), location (LOC), and organization (ORG) entities. Used for benchmarking financial named entity recognition performance.">FiNER</th>
19
- <th colspan="4" class="has-text-centered tooltip-trigger" data-tooltip="A dataset for financial relation extraction focusing on relations between companies and financial metrics. Contains entity-relationship annotations from financial news, earnings reports, and regulatory filings.">FinRed</th>
20
- <th colspan="4" class="has-text-centered tooltip-trigger tooltip-right" style="position: relative;" data-title="ReFiND" data-tooltip="A dataset for information retrieval in the financial domain with queries and relevant document passages. Contains 6,500 queries and 280,000 financial document passages annotated for relevance.">ReFiND</th>
21
- <th colspan="4" class="has-text-centered tooltip-trigger tooltip-right" data-tooltip="A dataset of financial news articles labeled for entity extraction and document classification. Contains 1,500 articles with entity annotations and multi-label categorization for financial topics.">FNXL</th>
22
- <th colspan="4" class="has-text-centered tooltip-trigger tooltip-right" data-tooltip="A dataset for financial entity recognition with 6,200 documents containing annotations for company names, financial metrics, dates, and numerical values from earnings reports and financial news.">FinEntity</th>
23
  </tr>
24
  <tr>
25
  <th class="has-text-centered">Precision</th>
 
15
  <thead>
16
  <tr>
17
  <th rowspan="2">Model</th>
18
+ <th colspan="4" class="has-text-centered tooltip-trigger" data-title="FiNER-ORD" data-tooltip="FiNER-ORD is a manually annotated named entity recognition dataset comprising financial news articles with detailed entity annotations. The task requires identifying and correctly classifying person, location, and organization entities in financial contexts.">FiNER-ORD</th>
19
+ <th colspan="4" class="has-text-centered tooltip-trigger" data-title="FinRED" data-tooltip="FinRED is a specialized relation extraction dataset created from financial news and earnings call transcripts using distance supervision based on Wikidata triplets. The task involves identifying and extracting financial relationships between entities to understand connections in financial contexts.">FinRED</th>
20
+ <th colspan="4" class="has-text-centered tooltip-trigger tooltip-right" style="position: relative;" data-title="REFinD" data-tooltip="REFinD is a comprehensive relation extraction dataset containing approximately 29,000 annotated instances with 22 distinct relation types across 8 entity pair categories from various financial documents. The task requires identifying specific relationships between financial entities in complex documents like SEC filings.">REFinD</th>
21
+ <th colspan="4" class="has-text-centered tooltip-trigger tooltip-right" data-title="FNXL" data-tooltip="FNXL contains 79,088 sentences with 142,922 annotated numerals extracted from SEC 10-K reports and categorized under 2,794 distinct numerical labels. The information extraction task requires identifying, categorizing and understanding the financial significance of numerical entities in regulatory filings.">FNXL</th>
22
+ <th colspan="4" class="has-text-centered tooltip-trigger tooltip-right" data-title="FinEntity" data-tooltip="FinEntity consists of 979 financial news paragraphs containing 2,131 manually-annotated financial entities with sentiment classifications. The task involves identifying companies and asset classes in financial texts while determining the associated sentiment expressed toward each entity.">FinEntity</th>
23
  </tr>
24
  <tr>
25
  <th class="has-text-centered">Precision</th>
qa_table.html CHANGED
@@ -18,9 +18,9 @@
18
  <th colspan="3" class="has-text-centered">Datasets (Accuracy)</th>
19
  </tr>
20
  <tr>
21
- <th class="has-text-centered tooltip-trigger" data-tooltip="Large-scale dataset for numerical reasoning over financial data, consisting of 8,281 question-answer pairs from financial reports. Focuses on questions requiring interpretation of financial data and multi-step reasoning. Licensed under CC BY-NC 4.0.">FinQA</th>
22
- <th class="has-text-centered tooltip-trigger tooltip-right" data-tooltip="Multi-turn question answering dataset with 3,892 conversations and 14,115 questions exploring chains of numerical reasoning in financial dialogues. Released under MIT License.">ConvFinQA</th>
23
- <th class="has-text-centered tooltip-trigger tooltip-right" data-tooltip="Large-scale QA dataset for hybrid data sources (tables and text) from financial reports, emphasizing numerical reasoning operations. Licensed under CC BY 4.0.">TATQA</th>
24
  </tr>
25
  </thead>
26
  <tbody>
 
18
  <th colspan="3" class="has-text-centered">Datasets (Accuracy)</th>
19
  </tr>
20
  <tr>
21
+ <th class="has-text-centered tooltip-trigger" data-title="FinQA" data-tooltip="FinQA contains 8,281 question-answer pairs derived from financial reports that require numerical reasoning over tabular financial data. The question-answering task features multi-step reasoning challenges with full annotation of reasoning programs to solve complex financial queries.">FinQA</th>
22
+ <th class="has-text-centered tooltip-trigger tooltip-right" data-title="ConvFinQA" data-tooltip="ConvFinQA is a multi-turn question answering dataset with 3,892 conversations containing 14,115 questions that explore chains of numerical reasoning in financial contexts. The conversational task requires maintaining context while performing sequential numerical operations to answer increasingly complex financial questions.">ConvFinQA</th>
23
+ <th class="has-text-centered tooltip-trigger tooltip-right" data-title="TATQA" data-tooltip="TATQA is a large-scale question answering dataset for hybrid data sources that combines tables and text from financial reports. The task emphasizes numerical reasoning operations across multiple formats, requiring models to integrate information from structured and unstructured sources to answer financial questions.">TATQA</th>
24
  </tr>
25
  </thead>
26
  <tbody>
results.html CHANGED
@@ -97,26 +97,26 @@
97
  </tr>
98
  <tr>
99
  <th>Dataset</th>
100
- <th class="has-text-centered tooltip-trigger column-border-left" data-title="FiNER-Open Research Dataset" data-tooltip="FiNER-Open Research Dataset: A manually annotated dataset for financial named entity recognition, containing 47,851 financial news articles with annotations for person, location, and organization entities.">FiNER<span class="metric-label">F1</span></th>
101
- <th class="has-text-centered tooltip-trigger" data-title="FinRED" data-tooltip="FinRED: A specialized relation extraction dataset for the financial domain, created from financial news and earnings call transcripts, with financial relations mapped using distance supervision based on Wikidata triplets.">FR<span class="metric-label">F1</span></th>
102
- <th class="has-text-centered tooltip-trigger" data-title="FinEntity" data-tooltip="FinEntity: Dataset for financial entity recognition and entity linking from diverse financial documents. Features 5,554 entity annotations with company, person, and product types linked to a knowledge base.">FE<span class="metric-label">Acc</span></th>
103
- <th class="has-text-centered tooltip-trigger" data-title="ReFINED" data-tooltip="ReFINED: Information retrieval and extraction benchmark spanning financial domains, derived from datasets including FiQA, CoFiF, ConvFinQA, and SemEval tasks with 10,700+ pairs of queries and relevant documents.">RF<span class="metric-label">MAP</span></th>
104
- <th class="has-text-centered tooltip-trigger column-border-left" data-title="FiQA Task 1" data-tooltip="FiQA Task 1 focuses on aspect-based financial sentiment analysis. Given a financial text, such as microblog posts or news headlines, systems predict sentiment scores on a continuous scale from -1 (negative) to 1 (positive). Evaluation metrics include MSE, MAE, and R-squared.">FQ1<span class="metric-label">MSE</span></th>
105
- <th class="has-text-centered tooltip-trigger column-border-left" data-tooltip="Financial Phrase Bank contains 4,840 sentences from English-language financial news articles, categorized as positive, negative, or neutral. Each sentence reflects the sentiment an investor might perceive regarding its influence on stock prices. Annotated by 16 finance experts using majority voting.">FPB<span class="metric-label">F1</span></th>
106
- <th class="has-text-centered tooltip-trigger" data-tooltip="Manually-annotated dataset focusing on subjectivity in Earnings Call Transcripts QA sessions. Includes 49,446 annotations across 2,747 QA pairs labeled on six subjectivity features: Assertive, Cautious, Optimistic, Specific, Clear, and Relevant.">SQA<span class="metric-label">F1</span></th>
107
- <th class="has-text-centered tooltip-trigger" data-tooltip="Expert-annotated dataset of financial corpora with sentiment labels from ratings agencies and news publishers. Contains 10K+ positive/negative texts and 1K neutral samples.">FNXL<span class="metric-label">Acc</span></th>
108
- <th class="has-text-centered tooltip-trigger column-border-left" data-tooltip="For text sections identified as causal, this task extracts the Cause and Effect spans, handling both unicausal and multicausal cases in financial texts.">CD<span class="metric-label">F1</span></th>
109
- <th class="has-text-centered tooltip-trigger" data-tooltip="Determines if a given financial text section contains a causal relation, labeled as 1 if causal and 0 otherwise.">CC<span class="metric-label">F1</span></th>
110
- <th class="has-text-centered tooltip-trigger column-border-left" data-title="Banking77" data-tooltip="A fine-grained dataset designed for intent detection within the banking domain, comprising 13,083 customer service queries annotated with 77 unique intents.">Bank77<span class="metric-label">Acc</span></th>
111
- <th class="has-text-centered tooltip-trigger" data-title="FinBench" data-tooltip="A dataset designed to evaluate machine learning models using tabular data and profile text inputs for financial risk prediction, covering default, fraud, and churn with 333,000 labeled instances.">FBench<span class="metric-label">F1</span></th>
112
- <th class="has-text-centered tooltip-trigger" data-tooltip="A dataset of Federal Open Market Committee speeches, meeting minutes, and press conference transcripts (1996-2022) for hawkish-dovish classification of monetary policy stance.">FOMC<span class="metric-label">F1</span></th>
113
- <th class="has-text-centered tooltip-trigger" data-tooltip="An expert-annotated dataset for detecting fine-grained investor claims within financial narratives, focusing on numerals in analyst reports and earnings call transcripts.">NumC<span class="metric-label">Acc</span></th>
114
- <th class="has-text-centered tooltip-trigger tooltip-right" data-tooltip="A dataset of 11,412 human-annotated financial news headlines focused on commodities (particularly gold), spanning 2000-2019, with binary indicators for price mentions and movements.">Headlines<span class="metric-label">F1</span></th>
115
- <th class="has-text-centered tooltip-trigger column-border-left" data-tooltip="Large-scale dataset for numerical reasoning over financial data, consisting of 8,281 question-answer pairs from financial reports. Focuses on questions requiring interpretation of financial data and multi-step reasoning. Licensed under CC BY-NC 4.0.">FinQA<span class="metric-label">Acc</span></th>
116
- <th class="has-text-centered tooltip-trigger tooltip-right" data-tooltip="Multi-turn question answering dataset with 3,892 conversations and 14,115 questions exploring chains of numerical reasoning in financial dialogues. Released under MIT License.">ConvFQA<span class="metric-label">EM</span></th>
117
- <th class="has-text-centered tooltip-trigger tooltip-right" data-tooltip="Large-scale QA dataset for hybrid data sources (tables and text) from financial reports, emphasizing numerical reasoning operations. Licensed under CC BY 4.0.">TaTQA<span class="metric-label">EM</span></th>
118
- <th class="has-text-centered tooltip-trigger tooltip-right column-border-left" data-title="ECTSum" data-tooltip="A dataset focused on extractive summarization of earnings call transcripts, containing 2,425 transcripts from investor calls. Each instance includes a transcript, company name, and the ground truth summary. The data spans diverse sectors from S&P 500 companies.">EarCall<span class="metric-label">ROUGE</span></th>
119
- <th class="has-text-centered tooltip-trigger tooltip-right" data-tooltip="A dataset comprising 4,750 financial disclosures from 10-K or 10-Q filings of public companies. Each entry contains a company name, filing date, source domain, and disclosure text with its expert-crafted abstract.">FinD<span class="metric-label">ROUGE</span></th>
120
  </tr>
121
  </thead>
122
  <tbody>
@@ -347,8 +347,8 @@
347
  <thead>
348
  <tr>
349
  <th rowspan="2">Model</th>
350
- <th colspan="4" class="has-text-centered tooltip-trigger" data-tooltip="For text sections identified as causal, this task extracts the Cause and Effect spans, handling both unicausal and multicausal cases in financial texts.">Causal Detection (CD)</th>
351
- <th colspan="4" class="has-text-centered tooltip-trigger" data-tooltip="Determines if a given financial text section contains a causal relation, labeled as 1 if causal and 0 otherwise.">Causal Classification (CC)</th>
352
  </tr>
353
  <tr>
354
  <th class="has-text-centered">Accuracy</th>
@@ -635,10 +635,10 @@
635
  <thead>
636
  <tr>
637
  <th rowspan="2">Model</th>
638
- <th colspan="4" class="has-text-centered tooltip-trigger" data-tooltip="A manually annotated dataset of 47,851 financial news articles with named entity annotations for person (PER), location (LOC), and organization (ORG) entities. Used for benchmarking financial named entity recognition performance.">FiNER</th>
639
- <th colspan="4" class="has-text-centered tooltip-trigger" data-tooltip="A dataset for financial relation extraction focusing on relations between companies and financial metrics. Contains entity-relationship annotations from financial news, earnings reports, and regulatory filings.">FinRed</th>
640
- <th colspan="4" class="has-text-centered tooltip-trigger" data-tooltip="Financial entity recognition and linking dataset with company, person, and product annotations across diverse financial documents. Features 5,554 entity annotations linking to a knowledge base.">FinEntity</th>
641
- <th colspan="4" class="has-text-centered tooltip-trigger" data-tooltip="Information retrieval benchmark for financial domains with 10,700+ query-document pairs from datasets including FiQA, CoFiF, and ConvFinQA. Evaluates retrieval model effectiveness on financial text.">ReFINED</th>
642
  </tr>
643
  <tr>
644
  <th class="has-text-centered">Accuracy</th>
@@ -1120,9 +1120,9 @@
1120
  <th colspan="3" class="has-text-centered">Datasets (Accuracy)</th>
1121
  </tr>
1122
  <tr>
1123
- <th class="has-text-centered tooltip-trigger" data-tooltip="Large-scale dataset for numerical reasoning over financial data, consisting of 8,281 question-answer pairs from financial reports. Focuses on questions requiring interpretation of financial data and multi-step reasoning. Licensed under CC BY-NC 4.0.">FinQA</th>
1124
- <th class="has-text-centered tooltip-trigger tooltip-right" data-tooltip="Multi-turn question answering dataset with 3,892 conversations and 14,115 questions exploring chains of numerical reasoning in financial dialogues. Released under MIT License.">ConvFinQA</th>
1125
- <th class="has-text-centered tooltip-trigger tooltip-right" data-tooltip="Large-scale QA dataset for hybrid data sources (tables and text) from financial reports, emphasizing numerical reasoning operations. Licensed under CC BY 4.0.">TATQA</th>
1126
  </tr>
1127
  </thead>
1128
  <tbody>
@@ -1284,9 +1284,9 @@
1284
  <thead>
1285
  <tr>
1286
  <th rowspan="2">Model</th>
1287
- <th colspan="3" class="has-text-centered tooltip-trigger" data-tooltip="FiQA Task 1 focuses on aspect-based financial sentiment analysis. Given a financial text, such as microblog posts or news headlines, systems predict sentiment scores on a continuous scale from -1 (negative) to 1 (positive). Evaluation metrics include MSE, MAE, and R-squared.">FiQA Task 1</th>
1288
- <th colspan="4" class="has-text-centered tooltip-trigger" data-tooltip="Financial Phrase Bank contains 4,840 sentences from English-language financial news articles, categorized as positive, negative, or neutral. Each sentence reflects the sentiment an investor might perceive regarding its influence on stock prices. Annotated by 16 finance experts using majority voting.">Financial Phrase Bank (FPB)</th>
1289
- <th colspan="4" class="has-text-centered tooltip-trigger" data-tooltip="Manually-annotated dataset focusing on subjectivity in Earnings Call Transcripts QA sessions. Includes 49,446 annotations across 2,747 QA pairs labeled on six subjectivity features: Assertive, Cautious, Optimistic, Specific, Clear, and Relevant.">SubjECTive-QA</th>
1290
  </tr>
1291
  <tr>
1292
  <th class="has-text-centered">MSE</th>
@@ -1645,11 +1645,11 @@
1645
  <thead>
1646
  <tr>
1647
  <th rowspan="2">Model</th>
1648
- <th colspan="4" class="has-text-centered tooltip-trigger" data-tooltip="A fine-grained dataset designed for intent detection within the banking domain, comprising 13,083 customer service queries annotated with 77 unique intents.">Banking77</th>
1649
- <th colspan="4" class="has-text-centered tooltip-trigger" data-title="FinBench" data-tooltip="A dataset designed to evaluate machine learning models using tabular data and profile text inputs for financial risk prediction, covering default, fraud, and churn with 333,000 labeled instances.">FinBench</th>
1650
- <th colspan="4" class="has-text-centered tooltip-trigger" data-tooltip="A dataset of Federal Open Market Committee speeches, meeting minutes, and press conference transcripts (1996-2022) for hawkish-dovish classification of monetary policy stance.">FOMC</th>
1651
- <th colspan="4" class="has-text-centered tooltip-trigger" data-tooltip="An expert-annotated dataset for detecting fine-grained investor claims within financial narratives, focusing on numerals in analyst reports and earnings call transcripts.">NumClaim</th>
1652
- <th colspan="1" class="has-text-centered tooltip-trigger tooltip-right" data-tooltip="A dataset of 11,412 human-annotated financial news headlines focused on commodities (particularly gold), spanning 2000-2019, with binary indicators for price mentions and movements.">Headlines</th>
1653
  </tr>
1654
  <tr>
1655
  <th class="has-text-centered">Accuracy</th>
@@ -2152,8 +2152,8 @@
2152
  <thead>
2153
  <tr>
2154
  <th rowspan="2">Model</th>
2155
- <th colspan="3" class="has-text-centered tooltip-trigger" data-tooltip="Designed for bullet-point summarization of long earnings call transcripts (ECTs) in the financial domain. 2,425 document-summary pairs from publicly traded companies' earnings calls (2019-2022), with concise bullet points extracted from Reuters articles focusing on key financial metrics.">ECTSum</th>
2156
- <th colspan="3" class="has-text-centered tooltip-trigger" data-tooltip="Financial news summarization dataset with 2,000 financial news articles, each paired with its headline as the ground-truth summary. Manually selected and cleaned to ensure high-quality annotations, providing a benchmark for evaluating LLMs on financial text summarization.">EDTSum</th>
2157
  </tr>
2158
  <tr>
2159
  <th class="has-text-centered">BERTScore Precision</th>
 
97
  </tr>
98
  <tr>
99
  <th>Dataset</th>
100
+ <th class="has-text-centered tooltip-trigger column-border-left" data-title="FiNER-ORD" data-tooltip="FiNER-ORD is a manually annotated named entity recognition dataset comprising financial news articles with detailed entity annotations. The task requires identifying and correctly classifying person, location, and organization entities in financial contexts.">FiNER<span class="metric-label">F1</span></th>
101
+ <th class="has-text-centered tooltip-trigger" data-title="FinRED" data-tooltip="FinRED is a specialized relation extraction dataset created from financial news and earnings call transcripts using distance supervision based on Wikidata triplets. The task involves identifying and extracting financial relationships between entities to understand connections in financial contexts.">FR<span class="metric-label">F1</span></th>
102
+ <th class="has-text-centered tooltip-trigger" data-title="FinEntity" data-tooltip="FinEntity consists of 979 financial news paragraphs containing 2,131 manually-annotated financial entities with sentiment classifications. The task involves identifying companies and asset classes in financial texts while determining the associated sentiment expressed toward each entity.">FE<span class="metric-label">F1</span></th>
103
+ <th class="has-text-centered tooltip-trigger" data-title="REFinD" data-tooltip="REFinD is a comprehensive relation extraction dataset containing approximately 29,000 annotated instances with 22 distinct relation types across 8 entity pair categories from various financial documents. The task requires identifying specific relationships between financial entities in complex documents like SEC filings.">RD<span class="metric-label">F1</span></th>
104
+ <th class="has-text-centered tooltip-trigger column-border-left" data-title="FiQA Task 1" data-tooltip="FiQA Task 1 focuses on aspect-based financial sentiment analysis in microblog posts and news headlines using a continuous scale from -1 (negative) to 1 (positive). The regression task requires models to accurately predict the sentiment score that reflects investor perception of financial texts.">FiQA<span class="metric-label">MSE</span></th>
105
+ <th class="has-text-centered tooltip-trigger" data-title="Financial Phrase Bank" data-tooltip="Financial Phrase Bank (FPB) contains 4,840 sentences from financial news articles categorized as positive, negative, or neutral by 16 finance experts using majority voting. The sentiment classification task requires understanding how these statements might influence investor perception of stock prices.">FPB<span class="metric-label">F1</span></th>
106
+ <th class="has-text-centered tooltip-trigger" data-title="SubjECTive-QA" data-tooltip="SubjECTive-QA contains 49,446 annotations across 2,747 question-answer pairs extracted from 120 earnings call transcripts. The multi-label classification task involves analyzing six subjective features in financial discourse: assertiveness, cautiousness, optimism, specificity, clarity, and relevance.">SQA<span class="metric-label">F1</span></th>
107
+ <th class="has-text-centered tooltip-trigger" data-title="FNXL" data-tooltip="FNXL contains 79,088 sentences with 142,922 annotated numerals extracted from SEC 10-K reports and categorized under 2,794 distinct numerical labels. The information extraction task requires identifying, categorizing and understanding the financial significance of numerical entities in regulatory filings.">FNXL<span class="metric-label">F1</span></th>
108
+ <th class="has-text-centered tooltip-trigger column-border-left" data-title="FinCausal (CD)" data-tooltip="FinCausal Causal Discovery (CD) contains 29,444 text sections from financial news, with 2,136 annotated as expressing causal relationships. The task involves extracting precise cause and effect spans from financial texts that contain causal relationships.">CD<span class="metric-label">F1</span></th>
109
+ <th class="has-text-centered tooltip-trigger" data-title="FinCausal (CC)" data-tooltip="FinCausal Causality Classification (CC) consists of 29,444 text sections from financial news with binary annotations indicating causal relationships. The classification task requires determining whether a given financial text section contains a causal relationship (1) or not (0).">CC<span class="metric-label">F1</span></th>
110
+ <th class="has-text-centered tooltip-trigger column-border-left" data-title="Banking77" data-tooltip="Banking77 is a fine-grained dataset comprising 13,083 customer service queries annotated with 77 unique intents from the banking domain. The task involves accurately classifying each customer query into the correct intent category to improve automated banking support systems.">B77<span class="metric-label">F1</span></th>
111
+ <th class="has-text-centered tooltip-trigger" data-title="FinBench" data-tooltip="FinBench is a comprehensive evaluation dataset containing 333,000 labeled instances that combines tabular data and profile text for financial risk prediction. The task requires models to predict financial outcomes across three key risk categories: default, fraud, and customer churn.">FB<span class="metric-label">F1</span></th>
112
+ <th class="has-text-centered tooltip-trigger" data-title="FOMC" data-tooltip="FOMC is a dataset containing Federal Open Market Committee speeches, meeting minutes, and press conference transcripts spanning from 1996 to 2022. The classification task involves determining whether the monetary policy stance expressed in each document is hawkish (tighter monetary policy) or dovish (looser monetary policy).">FOMC<span class="metric-label">F1</span></th>
113
+ <th class="has-text-centered tooltip-trigger" data-title="NumClaim" data-tooltip="NumClaim is an expert-annotated dataset for detecting and analyzing fine-grained investor claims within financial narratives that contain numerical information. The task requires identifying and categorizing claims containing numerals in analyst reports and earnings call transcripts for investment decision making.">NC<span class="metric-label">F1</span></th>
114
+ <th class="has-text-centered tooltip-trigger tooltip-right" data-title="Headlines" data-tooltip="Headlines is a dataset containing 11,412 human-annotated financial news headlines focused on commodities, particularly gold, spanning from 2000 to 2019. The classification task involves identifying binary indicators for price mentions and directional price movements in these concise financial texts.">HL<span class="metric-label">Acc</span></th>
115
+ <th class="has-text-centered tooltip-trigger column-border-left" data-title="FinQA" data-tooltip="FinQA contains 8,281 question-answer pairs derived from financial reports that require numerical reasoning over tabular financial data. The question-answering task features multi-step reasoning challenges with full annotation of reasoning programs to solve complex financial queries.">FinQA<span class="metric-label">Acc</span></th>
116
+ <th class="has-text-centered tooltip-trigger tooltip-right" data-title="ConvFinQA" data-tooltip="ConvFinQA is a multi-turn question answering dataset with 3,892 conversations containing 14,115 questions that explore chains of numerical reasoning in financial contexts. The conversational task requires maintaining context while performing sequential numerical operations to answer increasingly complex financial questions.">CFQA<span class="metric-label">Acc</span></th>
117
+ <th class="has-text-centered tooltip-trigger tooltip-right" data-title="TATQA" data-tooltip="TATQA is a large-scale question answering dataset for hybrid data sources that combines tables and text from financial reports. The task emphasizes numerical reasoning operations across multiple formats, requiring models to integrate information from structured and unstructured sources to answer financial questions.">TQA<span class="metric-label">Acc</span></th>
118
+ <th class="has-text-centered tooltip-trigger tooltip-right column-border-left" data-title="ECTSum" data-tooltip="ECTSum contains 2,425 document-summary pairs featuring earnings call transcripts paired with concise bullet-point summaries extracted from Reuters articles. The summarization task requires extracting and condensing key financial information from lengthy corporate communications into brief, informative points.">ECTSum<span class="metric-label">BERT</span></th>
119
+ <th class="has-text-centered tooltip-trigger tooltip-right" data-title="EDTSum" data-tooltip="EDTSum consists of 2,000 financial news articles paired with their headlines as ground-truth summaries for evaluating text summarization. The task challenges models to condense complex financial news articles into concise, informative headlines that capture the essential information.">EDTSum<span class="metric-label">BERT</span></th>
120
  </tr>
121
  </thead>
122
  <tbody>
 
347
  <thead>
348
  <tr>
349
  <th rowspan="2">Model</th>
350
+ <th colspan="4" class="has-text-centered tooltip-trigger" data-title="FinCausal (CD)" data-tooltip="FinCausal Causal Discovery (CD) contains 29,444 text sections from financial news, with 2,136 annotated as expressing causal relationships. The task involves extracting precise cause and effect spans from financial texts that contain causal relationships.">Causal Detection (CD)</th>
351
+ <th colspan="4" class="has-text-centered tooltip-trigger" data-title="FinCausal (CC)" data-tooltip="FinCausal Causality Classification (CC) consists of 29,444 text sections from financial news with binary annotations indicating causal relationships. The classification task requires determining whether a given financial text section contains a causal relationship (1) or not (0).">Causal Classification (CC)</th>
352
  </tr>
353
  <tr>
354
  <th class="has-text-centered">Accuracy</th>
 
635
  <thead>
636
  <tr>
637
  <th rowspan="2">Model</th>
638
+ <th colspan="4" class="has-text-centered tooltip-trigger" data-title="FiNER-ORD" data-tooltip="FiNER-ORD is a manually annotated named entity recognition dataset comprising financial news articles with detailed entity annotations. The task requires identifying and correctly classifying person, location, and organization entities in financial contexts.">FiNER-ORD</th>
639
+ <th colspan="4" class="has-text-centered tooltip-trigger" data-title="FinRED" data-tooltip="FinRED is a specialized relation extraction dataset created from financial news and earnings call transcripts using distance supervision based on Wikidata triplets. The task involves identifying and extracting financial relationships between entities to understand connections in financial contexts.">FinRED</th>
640
+ <th colspan="4" class="has-text-centered tooltip-trigger" data-title="FinEntity" data-tooltip="FinEntity consists of 979 financial news paragraphs containing 2,131 manually-annotated financial entities with sentiment classifications. The task involves identifying companies and asset classes in financial texts while determining the associated sentiment expressed toward each entity.">FinEntity</th>
641
+ <th colspan="4" class="has-text-centered tooltip-trigger" data-title="REFinD" data-tooltip="REFinD is a comprehensive relation extraction dataset containing approximately 29,000 annotated instances with 22 distinct relation types across 8 entity pair categories from various financial documents. The task requires identifying specific relationships between financial entities in complex documents like SEC filings.">REFinD</th>
642
  </tr>
643
  <tr>
644
  <th class="has-text-centered">Accuracy</th>
 
1120
  <th colspan="3" class="has-text-centered">Datasets (Accuracy)</th>
1121
  </tr>
1122
  <tr>
1123
+ <th class="has-text-centered tooltip-trigger" data-title="FinQA" data-tooltip="FinQA contains 8,281 question-answer pairs derived from financial reports that require numerical reasoning over tabular financial data. The question-answering task features multi-step reasoning challenges with full annotation of reasoning programs to solve complex financial queries.">FinQA</th>
1124
+ <th class="has-text-centered tooltip-trigger tooltip-right" data-title="ConvFinQA" data-tooltip="ConvFinQA is a multi-turn question answering dataset with 3,892 conversations containing 14,115 questions that explore chains of numerical reasoning in financial contexts. The conversational task requires maintaining context while performing sequential numerical operations to answer increasingly complex financial questions.">ConvFinQA</th>
1125
+ <th class="has-text-centered tooltip-trigger tooltip-right" data-title="TATQA" data-tooltip="TATQA is a large-scale question answering dataset for hybrid data sources that combines tables and text from financial reports. The task emphasizes numerical reasoning operations across multiple formats, requiring models to integrate information from structured and unstructured sources to answer financial questions.">TATQA</th>
1126
  </tr>
1127
  </thead>
1128
  <tbody>
 
1284
  <thead>
1285
  <tr>
1286
  <th rowspan="2">Model</th>
1287
+ <th colspan="3" class="has-text-centered tooltip-trigger" data-title="FiQA Task 1" data-tooltip="FiQA Task 1 focuses on aspect-based financial sentiment analysis in microblog posts and news headlines using a continuous scale from -1 (negative) to 1 (positive). The regression task requires models to accurately predict the sentiment score that reflects investor perception of financial texts.">FiQA Task 1</th>
1288
+ <th colspan="4" class="has-text-centered tooltip-trigger" data-title="Financial Phrase Bank" data-tooltip="Financial Phrase Bank (FPB) contains 4,840 sentences from financial news articles categorized as positive, negative, or neutral by 16 finance experts using majority voting. The sentiment classification task requires understanding how these statements might influence investor perception of stock prices.">Financial Phrase Bank (FPB)</th>
1289
+ <th colspan="4" class="has-text-centered tooltip-trigger" data-title="SubjECTive-QA" data-tooltip="SubjECTive-QA contains 49,446 annotations across 2,747 question-answer pairs extracted from 120 earnings call transcripts. The multi-label classification task involves analyzing six subjective features in financial discourse: assertiveness, cautiousness, optimism, specificity, clarity, and relevance.">SubjECTive-QA</th>
1290
  </tr>
1291
  <tr>
1292
  <th class="has-text-centered">MSE</th>
 
1645
  <thead>
1646
  <tr>
1647
  <th rowspan="2">Model</th>
1648
+ <th colspan="4" class="has-text-centered tooltip-trigger" data-title="Banking77" data-tooltip="Banking77 is a fine-grained dataset comprising 13,083 customer service queries annotated with 77 unique intents from the banking domain. The task involves accurately classifying each customer query into the correct intent category to improve automated banking support systems.">Banking77</th>
1649
+ <th colspan="4" class="has-text-centered tooltip-trigger" data-title="FinBench" data-tooltip="FinBench is a comprehensive evaluation dataset containing 333,000 labeled instances that combines tabular data and profile text for financial risk prediction. The task requires models to predict financial outcomes across three key risk categories: default, fraud, and customer churn.">FinBench</th>
1650
+ <th colspan="4" class="has-text-centered tooltip-trigger" data-title="FOMC" data-tooltip="FOMC is a dataset containing Federal Open Market Committee speeches, meeting minutes, and press conference transcripts spanning from 1996 to 2022. The classification task involves determining whether the monetary policy stance expressed in each document is hawkish (tighter monetary policy) or dovish (looser monetary policy).">FOMC</th>
1651
+ <th colspan="4" class="has-text-centered tooltip-trigger" data-title="NumClaim" data-tooltip="NumClaim is an expert-annotated dataset for detecting and analyzing fine-grained investor claims within financial narratives that contain numerical information. The task requires identifying and categorizing claims containing numerals in analyst reports and earnings call transcripts for investment decision making.">NumClaim</th>
1652
+ <th colspan="1" class="has-text-centered tooltip-trigger tooltip-right" data-title="Headlines" data-tooltip="Headlines is a dataset containing 11,412 human-annotated financial news headlines focused on commodities, particularly gold, spanning from 2000 to 2019. The classification task involves identifying binary indicators for price mentions and directional price movements in these concise financial texts.">Headlines</th>
1653
  </tr>
1654
  <tr>
1655
  <th class="has-text-centered">Accuracy</th>
 
2152
  <thead>
2153
  <tr>
2154
  <th rowspan="2">Model</th>
2155
+ <th colspan="3" class="has-text-centered tooltip-trigger" data-title="ECTSum" data-tooltip="ECTSum contains 2,425 document-summary pairs featuring earnings call transcripts paired with concise bullet-point summaries extracted from Reuters articles. The summarization task requires extracting and condensing key financial information from lengthy corporate communications into brief, informative points.">ECTSum</th>
2156
+ <th colspan="3" class="has-text-centered tooltip-trigger" data-title="EDTSum" data-tooltip="EDTSum consists of 2,000 financial news articles paired with their headlines as ground-truth summaries for evaluating text summarization. The task challenges models to condense complex financial news articles into concise, informative headlines that capture the essential information.">EDTSum</th>
2157
  </tr>
2158
  <tr>
2159
  <th class="has-text-centered">BERTScore Precision</th>
sentiment_analysis_table.html CHANGED
@@ -15,9 +15,9 @@
15
  <thead>
16
  <tr>
17
  <th rowspan="2">Model</th>
18
- <th colspan="3" class="has-text-centered tooltip-trigger" data-tooltip="FiQA Task 1 focuses on aspect-based financial sentiment analysis. Given a financial text, such as microblog posts or news headlines, systems predict sentiment scores on a continuous scale from -1 (negative) to 1 (positive). Evaluation metrics include MSE, MAE, and R-squared.">FiQA Task 1</th>
19
- <th colspan="4" class="has-text-centered tooltip-trigger" data-tooltip="Financial Phrase Bank contains 4,840 sentences from English-language financial news articles, categorized as positive, negative, or neutral. Each sentence reflects the sentiment an investor might perceive regarding its influence on stock prices. Annotated by 16 finance experts using majority voting.">Financial Phrase Bank (FPB)</th>
20
- <th colspan="4" class="has-text-centered tooltip-trigger tooltip-right" style="position: relative;" data-title="SubjECTive-QA" data-tooltip="Manually-annotated dataset focusing on subjectivity in Earnings Call Transcripts QA sessions. Includes 49,446 annotations across 2,747 QA pairs labeled on six subjectivity features: Assertive, Cautious, Optimistic, Specific, Clear, and Relevant.">SubjECTive-QA</th>
21
  </tr>
22
  <tr>
23
  <th class="has-text-centered">MSE</th>
 
15
  <thead>
16
  <tr>
17
  <th rowspan="2">Model</th>
18
+ <th colspan="3" class="has-text-centered tooltip-trigger" data-title="FiQA Task 1" data-tooltip="FiQA Task 1 focuses on aspect-based financial sentiment analysis in microblog posts and news headlines using a continuous scale from -1 (negative) to 1 (positive). The regression task requires models to accurately predict the sentiment score that reflects investor perception of financial texts.">FiQA Task 1</th>
19
+ <th colspan="4" class="has-text-centered tooltip-trigger" data-title="Financial Phrase Bank" data-tooltip="Financial Phrase Bank (FPB) contains 4,840 sentences from financial news articles categorized as positive, negative, or neutral by 16 finance experts using majority voting. The sentiment classification task requires understanding how these statements might influence investor perception of stock prices.">Financial Phrase Bank (FPB)</th>
20
+ <th colspan="4" class="has-text-centered tooltip-trigger tooltip-right" style="position: relative;" data-title="SubjECTive-QA" data-tooltip="SubjECTive-QA contains 49,446 annotations across 2,747 question-answer pairs extracted from 120 earnings call transcripts. The multi-label classification task involves analyzing six subjective features in financial discourse: assertiveness, cautiousness, optimism, specificity, clarity, and relevance.">SubjECTive-QA</th>
21
  </tr>
22
  <tr>
23
  <th class="has-text-centered">MSE</th>
static/css/results.css CHANGED
@@ -152,6 +152,11 @@ body {
152
  /* min-width: 110px; */
153
  }
154
 
 
 
 
 
 
155
  /* Adjust column widths for task groups */
156
  .results-table th[colspan="4"],
157
  .results-table th[colspan="3"],
@@ -238,4 +243,4 @@ body {
238
  .results-table td:first-child {
239
  word-break: break-word;
240
  hyphens: auto;
241
- }
 
152
  /* min-width: 110px; */
153
  }
154
 
155
+ /* Ensure all data cells are center-aligned */
156
+ .results-table td:not(:first-child) {
157
+ text-align: center !important;
158
+ }
159
+
160
  /* Adjust column widths for task groups */
161
  .results-table th[colspan="4"],
162
  .results-table th[colspan="3"],
 
243
  .results-table td:first-child {
244
  word-break: break-word;
245
  hyphens: auto;
246
+ }
text_classification_table.html CHANGED
@@ -15,11 +15,11 @@
15
  <thead>
16
  <tr>
17
  <th rowspan="2">Model</th>
18
- <th colspan="4" class="has-text-centered tooltip-trigger" data-tooltip="A fine-grained dataset designed for intent detection within the banking domain, comprising 13,083 customer service queries annotated with 77 unique intents.">Banking77</th>
19
- <th colspan="4" class="has-text-centered tooltip-trigger" data-tooltip="A dataset designed to evaluate machine learning models using tabular data and profile text inputs for financial risk prediction, covering default, fraud, and churn with 333,000 labeled instances.">FinBench</th>
20
- <th colspan="4" class="has-text-centered tooltip-trigger" data-tooltip="A dataset of Federal Open Market Committee speeches, meeting minutes, and press conference transcripts (1996-2022) for hawkish-dovish classification of monetary policy stance.">FOMC</th>
21
- <th colspan="4" class="has-text-centered tooltip-trigger tooltip-right" style="position: relative;" data-title="NumClaim" data-tooltip="An expert-annotated dataset for detecting fine-grained investor claims within financial narratives, focusing on numerals in analyst reports and earnings call transcripts.">NumClaim</th>
22
- <th colspan="1" class="has-text-centered tooltip-trigger tooltip-right" data-tooltip="A dataset of 11,412 human-annotated financial news headlines focused on commodities (particularly gold), spanning 2000-2019, with binary indicators for price mentions and movements.">Headlines</th>
23
  </tr>
24
  <tr>
25
  <th class="has-text-centered">Accuracy</th>
 
15
  <thead>
16
  <tr>
17
  <th rowspan="2">Model</th>
18
+ <th colspan="4" class="has-text-centered tooltip-trigger" data-title="Banking77" data-tooltip="Banking77 is a fine-grained dataset comprising 13,083 customer service queries annotated with 77 unique intents from the banking domain. The task involves accurately classifying each customer query into the correct intent category to improve automated banking support systems.">Banking77</th>
19
+ <th colspan="4" class="has-text-centered tooltip-trigger" data-title="FinBench" data-tooltip="FinBench is a comprehensive evaluation dataset containing 333,000 labeled instances that combines tabular data and profile text for financial risk prediction. The task requires models to predict financial outcomes across three key risk categories: default, fraud, and customer churn.">FinBench</th>
20
+ <th colspan="4" class="has-text-centered tooltip-trigger" data-title="FOMC" data-tooltip="FOMC is a dataset containing Federal Open Market Committee speeches, meeting minutes, and press conference transcripts spanning from 1996 to 2022. The classification task involves determining whether the monetary policy stance expressed in each document is hawkish (tighter monetary policy) or dovish (looser monetary policy).">FOMC</th>
21
+ <th colspan="4" class="has-text-centered tooltip-trigger tooltip-right" style="position: relative;" data-title="NumClaim" data-tooltip="NumClaim is an expert-annotated dataset for detecting and analyzing fine-grained investor claims within financial narratives that contain numerical information. The task requires identifying and categorizing claims containing numerals in analyst reports and earnings call transcripts for investment decision making.">NumClaim</th>
22
+ <th colspan="1" class="has-text-centered tooltip-trigger tooltip-right" data-title="Headlines" data-tooltip="Headlines is a dataset containing 11,412 human-annotated financial news headlines focused on commodities, particularly gold, spanning from 2000 to 2019. The classification task involves identifying binary indicators for price mentions and directional price movements in these concise financial texts.">Headlines</th>
23
  </tr>
24
  <tr>
25
  <th class="has-text-centered">Accuracy</th>
text_summarization_table.html CHANGED
@@ -15,8 +15,8 @@
15
  <thead>
16
  <tr>
17
  <th rowspan="2">Model</th>
18
- <th colspan="3" class="has-text-centered tooltip-trigger" data-tooltip="Designed for bullet-point summarization of long earnings call transcripts (ECTs) in the financial domain. 2,425 document-summary pairs from publicly traded companies' earnings calls (2019-2022), with concise bullet points extracted from Reuters articles focusing on key financial metrics.">ECTSum</th>
19
- <th colspan="3" class="has-text-centered tooltip-trigger tooltip-right" style="position: relative;" data-title="EDTSum" data-tooltip="Financial news summarization dataset with 2,000 financial news articles, each paired with its headline as the ground-truth summary. Manually selected and cleaned to ensure high-quality annotations, providing a benchmark for evaluating LLMs on financial text summarization.">EDTSum</th>
20
  </tr>
21
  <tr>
22
  <th class="has-text-centered">BERTScore Precision</th>
 
15
  <thead>
16
  <tr>
17
  <th rowspan="2">Model</th>
18
+ <th colspan="3" class="has-text-centered tooltip-trigger" data-title="ECTSum" data-tooltip="ECTSum contains 2,425 document-summary pairs featuring earnings call transcripts paired with concise bullet-point summaries extracted from Reuters articles. The summarization task requires extracting and condensing key financial information from lengthy corporate communications into brief, informative points.">ECTSum</th>
19
+ <th colspan="3" class="has-text-centered tooltip-trigger tooltip-right" style="position: relative;" data-title="EDTSum" data-tooltip="EDTSum consists of 2,000 financial news articles paired with their headlines as ground-truth summaries for evaluating text summarization. The task challenges models to condense complex financial news articles into concise, informative headlines that capture the essential information.">EDTSum</th>
20
  </tr>
21
  <tr>
22
  <th class="has-text-centered">BERTScore Precision</th>