Spaces:
Running
Running
dataset descriptions
Browse files- causal_analysis_table.html +2 -2
- information_retrieval_table.html +5 -5
- qa_table.html +3 -3
- results.html +39 -39
- sentiment_analysis_table.html +3 -3
- static/css/results.css +6 -1
- text_classification_table.html +5 -5
- text_summarization_table.html +2 -2
causal_analysis_table.html
CHANGED
|
@@ -15,8 +15,8 @@
|
|
| 15 |
<thead>
|
| 16 |
<tr>
|
| 17 |
<th rowspan="2">Model</th>
|
| 18 |
-
<th colspan="4" class="has-text-centered tooltip-trigger" data-tooltip="
|
| 19 |
-
<th colspan="4" class="has-text-centered tooltip-trigger tooltip-right" style="position: relative;" data-title="
|
| 20 |
</tr>
|
| 21 |
<tr>
|
| 22 |
<th class="has-text-centered">Accuracy</th>
|
|
|
|
| 15 |
<thead>
|
| 16 |
<tr>
|
| 17 |
<th rowspan="2">Model</th>
|
| 18 |
+
<th colspan="4" class="has-text-centered tooltip-trigger" data-title="FinCausal (CD)" data-tooltip="FinCausal Causal Discovery (CD) contains 29,444 text sections from financial news, with 2,136 annotated as expressing causal relationships. The task involves extracting precise cause and effect spans from financial texts that contain causal relationships.">FinCausal (CD)</th>
|
| 19 |
+
<th colspan="4" class="has-text-centered tooltip-trigger tooltip-right" style="position: relative;" data-title="FinCausal (CC)" data-tooltip="FinCausal Causality Classification (CC) consists of 29,444 text sections from financial news with binary annotations indicating causal relationships. The classification task requires determining whether a given financial text section contains a causal relationship (1) or not (0).">FinCausal (CC)</th>
|
| 20 |
</tr>
|
| 21 |
<tr>
|
| 22 |
<th class="has-text-centered">Accuracy</th>
|
information_retrieval_table.html
CHANGED
|
@@ -15,11 +15,11 @@
|
|
| 15 |
<thead>
|
| 16 |
<tr>
|
| 17 |
<th rowspan="2">Model</th>
|
| 18 |
-
<th colspan="4" class="has-text-centered tooltip-trigger" data-tooltip="
|
| 19 |
-
<th colspan="4" class="has-text-centered tooltip-trigger" data-tooltip="
|
| 20 |
-
<th colspan="4" class="has-text-centered tooltip-trigger tooltip-right" style="position: relative;" data-title="
|
| 21 |
-
<th colspan="4" class="has-text-centered tooltip-trigger tooltip-right" data-tooltip="
|
| 22 |
-
<th colspan="4" class="has-text-centered tooltip-trigger tooltip-right" data-tooltip="
|
| 23 |
</tr>
|
| 24 |
<tr>
|
| 25 |
<th class="has-text-centered">Precision</th>
|
|
|
|
| 15 |
<thead>
|
| 16 |
<tr>
|
| 17 |
<th rowspan="2">Model</th>
|
| 18 |
+
<th colspan="4" class="has-text-centered tooltip-trigger" data-title="FiNER-ORD" data-tooltip="FiNER-ORD is a manually annotated named entity recognition dataset comprising financial news articles with detailed entity annotations. The task requires identifying and correctly classifying person, location, and organization entities in financial contexts.">FiNER-ORD</th>
|
| 19 |
+
<th colspan="4" class="has-text-centered tooltip-trigger" data-title="FinRED" data-tooltip="FinRED is a specialized relation extraction dataset created from financial news and earnings call transcripts using distance supervision based on Wikidata triplets. The task involves identifying and extracting financial relationships between entities to understand connections in financial contexts.">FinRED</th>
|
| 20 |
+
<th colspan="4" class="has-text-centered tooltip-trigger tooltip-right" style="position: relative;" data-title="REFinD" data-tooltip="REFinD is a comprehensive relation extraction dataset containing approximately 29,000 annotated instances with 22 distinct relation types across 8 entity pair categories from various financial documents. The task requires identifying specific relationships between financial entities in complex documents like SEC filings.">REFinD</th>
|
| 21 |
+
<th colspan="4" class="has-text-centered tooltip-trigger tooltip-right" data-title="FNXL" data-tooltip="FNXL contains 79,088 sentences with 142,922 annotated numerals extracted from SEC 10-K reports and categorized under 2,794 distinct numerical labels. The information extraction task requires identifying, categorizing and understanding the financial significance of numerical entities in regulatory filings.">FNXL</th>
|
| 22 |
+
<th colspan="4" class="has-text-centered tooltip-trigger tooltip-right" data-title="FinEntity" data-tooltip="FinEntity consists of 979 financial news paragraphs containing 2,131 manually-annotated financial entities with sentiment classifications. The task involves identifying companies and asset classes in financial texts while determining the associated sentiment expressed toward each entity.">FinEntity</th>
|
| 23 |
</tr>
|
| 24 |
<tr>
|
| 25 |
<th class="has-text-centered">Precision</th>
|
qa_table.html
CHANGED
|
@@ -18,9 +18,9 @@
|
|
| 18 |
<th colspan="3" class="has-text-centered">Datasets (Accuracy)</th>
|
| 19 |
</tr>
|
| 20 |
<tr>
|
| 21 |
-
<th class="has-text-centered tooltip-trigger" data-
|
| 22 |
-
<th class="has-text-centered tooltip-trigger tooltip-right" data-tooltip="
|
| 23 |
-
<th class="has-text-centered tooltip-trigger tooltip-right" data-tooltip="
|
| 24 |
</tr>
|
| 25 |
</thead>
|
| 26 |
<tbody>
|
|
|
|
| 18 |
<th colspan="3" class="has-text-centered">Datasets (Accuracy)</th>
|
| 19 |
</tr>
|
| 20 |
<tr>
|
| 21 |
+
<th class="has-text-centered tooltip-trigger" data-title="FinQA" data-tooltip="FinQA contains 8,281 question-answer pairs derived from financial reports that require numerical reasoning over tabular financial data. The question-answering task features multi-step reasoning challenges with full annotation of reasoning programs to solve complex financial queries.">FinQA</th>
|
| 22 |
+
<th class="has-text-centered tooltip-trigger tooltip-right" data-title="ConvFinQA" data-tooltip="ConvFinQA is a multi-turn question answering dataset with 3,892 conversations containing 14,115 questions that explore chains of numerical reasoning in financial contexts. The conversational task requires maintaining context while performing sequential numerical operations to answer increasingly complex financial questions.">ConvFinQA</th>
|
| 23 |
+
<th class="has-text-centered tooltip-trigger tooltip-right" data-title="TATQA" data-tooltip="TATQA is a large-scale question answering dataset for hybrid data sources that combines tables and text from financial reports. The task emphasizes numerical reasoning operations across multiple formats, requiring models to integrate information from structured and unstructured sources to answer financial questions.">TATQA</th>
|
| 24 |
</tr>
|
| 25 |
</thead>
|
| 26 |
<tbody>
|
results.html
CHANGED
|
@@ -97,26 +97,26 @@
|
|
| 97 |
</tr>
|
| 98 |
<tr>
|
| 99 |
<th>Dataset</th>
|
| 100 |
-
<th class="has-text-centered tooltip-trigger column-border-left" data-title="FiNER-
|
| 101 |
-
<th class="has-text-centered tooltip-trigger" data-title="FinRED" data-tooltip="FinRED
|
| 102 |
-
<th class="has-text-centered tooltip-trigger" data-title="FinEntity" data-tooltip="FinEntity
|
| 103 |
-
<th class="has-text-centered tooltip-trigger" data-title="
|
| 104 |
-
<th class="has-text-centered tooltip-trigger column-border-left" data-title="FiQA Task 1" data-tooltip="FiQA Task 1 focuses on aspect-based financial sentiment analysis
|
| 105 |
-
<th class="has-text-centered tooltip-trigger
|
| 106 |
-
<th class="has-text-centered tooltip-trigger" data-tooltip="
|
| 107 |
-
<th class="has-text-centered tooltip-trigger" data-tooltip="
|
| 108 |
-
<th class="has-text-centered tooltip-trigger column-border-left" data-tooltip="
|
| 109 |
-
<th class="has-text-centered tooltip-trigger" data-tooltip="
|
| 110 |
-
<th class="has-text-centered tooltip-trigger column-border-left" data-title="Banking77" data-tooltip="
|
| 111 |
-
<th class="has-text-centered tooltip-trigger" data-title="FinBench" data-tooltip="
|
| 112 |
-
<th class="has-text-centered tooltip-trigger" data-tooltip="
|
| 113 |
-
<th class="has-text-centered tooltip-trigger" data-tooltip="
|
| 114 |
-
<th class="has-text-centered tooltip-trigger tooltip-right" data-tooltip="
|
| 115 |
-
<th class="has-text-centered tooltip-trigger column-border-left" data-
|
| 116 |
-
<th class="has-text-centered tooltip-trigger tooltip-right" data-tooltip="
|
| 117 |
-
<th class="has-text-centered tooltip-trigger tooltip-right" data-tooltip="
|
| 118 |
-
<th class="has-text-centered tooltip-trigger tooltip-right column-border-left" data-title="ECTSum" data-tooltip="
|
| 119 |
-
<th class="has-text-centered tooltip-trigger tooltip-right" data-tooltip="
|
| 120 |
</tr>
|
| 121 |
</thead>
|
| 122 |
<tbody>
|
|
@@ -347,8 +347,8 @@
|
|
| 347 |
<thead>
|
| 348 |
<tr>
|
| 349 |
<th rowspan="2">Model</th>
|
| 350 |
-
<th colspan="4" class="has-text-centered tooltip-trigger" data-tooltip="
|
| 351 |
-
<th colspan="4" class="has-text-centered tooltip-trigger" data-tooltip="
|
| 352 |
</tr>
|
| 353 |
<tr>
|
| 354 |
<th class="has-text-centered">Accuracy</th>
|
|
@@ -635,10 +635,10 @@
|
|
| 635 |
<thead>
|
| 636 |
<tr>
|
| 637 |
<th rowspan="2">Model</th>
|
| 638 |
-
<th colspan="4" class="has-text-centered tooltip-trigger" data-tooltip="
|
| 639 |
-
<th colspan="4" class="has-text-centered tooltip-trigger" data-tooltip="
|
| 640 |
-
<th colspan="4" class="has-text-centered tooltip-trigger" data-tooltip="
|
| 641 |
-
<th colspan="4" class="has-text-centered tooltip-trigger" data-tooltip="
|
| 642 |
</tr>
|
| 643 |
<tr>
|
| 644 |
<th class="has-text-centered">Accuracy</th>
|
|
@@ -1120,9 +1120,9 @@
|
|
| 1120 |
<th colspan="3" class="has-text-centered">Datasets (Accuracy)</th>
|
| 1121 |
</tr>
|
| 1122 |
<tr>
|
| 1123 |
-
<th class="has-text-centered tooltip-trigger" data-
|
| 1124 |
-
<th class="has-text-centered tooltip-trigger tooltip-right" data-tooltip="
|
| 1125 |
-
<th class="has-text-centered tooltip-trigger tooltip-right" data-tooltip="
|
| 1126 |
</tr>
|
| 1127 |
</thead>
|
| 1128 |
<tbody>
|
|
@@ -1284,9 +1284,9 @@
|
|
| 1284 |
<thead>
|
| 1285 |
<tr>
|
| 1286 |
<th rowspan="2">Model</th>
|
| 1287 |
-
<th colspan="3" class="has-text-centered tooltip-trigger" data-tooltip="FiQA Task 1 focuses on aspect-based financial sentiment analysis
|
| 1288 |
-
<th colspan="4" class="has-text-centered tooltip-trigger" data-tooltip="Financial Phrase Bank contains 4,840 sentences from
|
| 1289 |
-
<th colspan="4" class="has-text-centered tooltip-trigger" data-tooltip="
|
| 1290 |
</tr>
|
| 1291 |
<tr>
|
| 1292 |
<th class="has-text-centered">MSE</th>
|
|
@@ -1645,11 +1645,11 @@
|
|
| 1645 |
<thead>
|
| 1646 |
<tr>
|
| 1647 |
<th rowspan="2">Model</th>
|
| 1648 |
-
<th colspan="4" class="has-text-centered tooltip-trigger" data-tooltip="
|
| 1649 |
-
<th colspan="4" class="has-text-centered tooltip-trigger" data-title="FinBench" data-tooltip="
|
| 1650 |
-
<th colspan="4" class="has-text-centered tooltip-trigger" data-tooltip="
|
| 1651 |
-
<th colspan="4" class="has-text-centered tooltip-trigger" data-tooltip="
|
| 1652 |
-
<th colspan="1" class="has-text-centered tooltip-trigger tooltip-right" data-tooltip="
|
| 1653 |
</tr>
|
| 1654 |
<tr>
|
| 1655 |
<th class="has-text-centered">Accuracy</th>
|
|
@@ -2152,8 +2152,8 @@
|
|
| 2152 |
<thead>
|
| 2153 |
<tr>
|
| 2154 |
<th rowspan="2">Model</th>
|
| 2155 |
-
<th colspan="3" class="has-text-centered tooltip-trigger" data-
|
| 2156 |
-
<th colspan="3" class="has-text-centered tooltip-trigger" data-tooltip="
|
| 2157 |
</tr>
|
| 2158 |
<tr>
|
| 2159 |
<th class="has-text-centered">BERTScore Precision</th>
|
|
|
|
| 97 |
</tr>
|
| 98 |
<tr>
|
| 99 |
<th>Dataset</th>
|
| 100 |
+
<th class="has-text-centered tooltip-trigger column-border-left" data-title="FiNER-ORD" data-tooltip="FiNER-ORD is a manually annotated named entity recognition dataset comprising financial news articles with detailed entity annotations. The task requires identifying and correctly classifying person, location, and organization entities in financial contexts.">FiNER<span class="metric-label">F1</span></th>
|
| 101 |
+
<th class="has-text-centered tooltip-trigger" data-title="FinRED" data-tooltip="FinRED is a specialized relation extraction dataset created from financial news and earnings call transcripts using distance supervision based on Wikidata triplets. The task involves identifying and extracting financial relationships between entities to understand connections in financial contexts.">FR<span class="metric-label">F1</span></th>
|
| 102 |
+
<th class="has-text-centered tooltip-trigger" data-title="FinEntity" data-tooltip="FinEntity consists of 979 financial news paragraphs containing 2,131 manually-annotated financial entities with sentiment classifications. The task involves identifying companies and asset classes in financial texts while determining the associated sentiment expressed toward each entity.">FE<span class="metric-label">F1</span></th>
|
| 103 |
+
<th class="has-text-centered tooltip-trigger" data-title="REFinD" data-tooltip="REFinD is a comprehensive relation extraction dataset containing approximately 29,000 annotated instances with 22 distinct relation types across 8 entity pair categories from various financial documents. The task requires identifying specific relationships between financial entities in complex documents like SEC filings.">RD<span class="metric-label">F1</span></th>
|
| 104 |
+
<th class="has-text-centered tooltip-trigger column-border-left" data-title="FiQA Task 1" data-tooltip="FiQA Task 1 focuses on aspect-based financial sentiment analysis in microblog posts and news headlines using a continuous scale from -1 (negative) to 1 (positive). The regression task requires models to accurately predict the sentiment score that reflects investor perception of financial texts.">FiQA<span class="metric-label">MSE</span></th>
|
| 105 |
+
<th class="has-text-centered tooltip-trigger" data-title="Financial Phrase Bank" data-tooltip="Financial Phrase Bank (FPB) contains 4,840 sentences from financial news articles categorized as positive, negative, or neutral by 16 finance experts using majority voting. The sentiment classification task requires understanding how these statements might influence investor perception of stock prices.">FPB<span class="metric-label">F1</span></th>
|
| 106 |
+
<th class="has-text-centered tooltip-trigger" data-title="SubjECTive-QA" data-tooltip="SubjECTive-QA contains 49,446 annotations across 2,747 question-answer pairs extracted from 120 earnings call transcripts. The multi-label classification task involves analyzing six subjective features in financial discourse: assertiveness, cautiousness, optimism, specificity, clarity, and relevance.">SQA<span class="metric-label">F1</span></th>
|
| 107 |
+
<th class="has-text-centered tooltip-trigger" data-title="FNXL" data-tooltip="FNXL contains 79,088 sentences with 142,922 annotated numerals extracted from SEC 10-K reports and categorized under 2,794 distinct numerical labels. The information extraction task requires identifying, categorizing and understanding the financial significance of numerical entities in regulatory filings.">FNXL<span class="metric-label">F1</span></th>
|
| 108 |
+
<th class="has-text-centered tooltip-trigger column-border-left" data-title="FinCausal (CD)" data-tooltip="FinCausal Causal Discovery (CD) contains 29,444 text sections from financial news, with 2,136 annotated as expressing causal relationships. The task involves extracting precise cause and effect spans from financial texts that contain causal relationships.">CD<span class="metric-label">F1</span></th>
|
| 109 |
+
<th class="has-text-centered tooltip-trigger" data-title="FinCausal (CC)" data-tooltip="FinCausal Causality Classification (CC) consists of 29,444 text sections from financial news with binary annotations indicating causal relationships. The classification task requires determining whether a given financial text section contains a causal relationship (1) or not (0).">CC<span class="metric-label">F1</span></th>
|
| 110 |
+
<th class="has-text-centered tooltip-trigger column-border-left" data-title="Banking77" data-tooltip="Banking77 is a fine-grained dataset comprising 13,083 customer service queries annotated with 77 unique intents from the banking domain. The task involves accurately classifying each customer query into the correct intent category to improve automated banking support systems.">B77<span class="metric-label">F1</span></th>
|
| 111 |
+
<th class="has-text-centered tooltip-trigger" data-title="FinBench" data-tooltip="FinBench is a comprehensive evaluation dataset containing 333,000 labeled instances that combines tabular data and profile text for financial risk prediction. The task requires models to predict financial outcomes across three key risk categories: default, fraud, and customer churn.">FB<span class="metric-label">F1</span></th>
|
| 112 |
+
<th class="has-text-centered tooltip-trigger" data-title="FOMC" data-tooltip="FOMC is a dataset containing Federal Open Market Committee speeches, meeting minutes, and press conference transcripts spanning from 1996 to 2022. The classification task involves determining whether the monetary policy stance expressed in each document is hawkish (tighter monetary policy) or dovish (looser monetary policy).">FOMC<span class="metric-label">F1</span></th>
|
| 113 |
+
<th class="has-text-centered tooltip-trigger" data-title="NumClaim" data-tooltip="NumClaim is an expert-annotated dataset for detecting and analyzing fine-grained investor claims within financial narratives that contain numerical information. The task requires identifying and categorizing claims containing numerals in analyst reports and earnings call transcripts for investment decision making.">NC<span class="metric-label">F1</span></th>
|
| 114 |
+
<th class="has-text-centered tooltip-trigger tooltip-right" data-title="Headlines" data-tooltip="Headlines is a dataset containing 11,412 human-annotated financial news headlines focused on commodities, particularly gold, spanning from 2000 to 2019. The classification task involves identifying binary indicators for price mentions and directional price movements in these concise financial texts.">HL<span class="metric-label">Acc</span></th>
|
| 115 |
+
<th class="has-text-centered tooltip-trigger column-border-left" data-title="FinQA" data-tooltip="FinQA contains 8,281 question-answer pairs derived from financial reports that require numerical reasoning over tabular financial data. The question-answering task features multi-step reasoning challenges with full annotation of reasoning programs to solve complex financial queries.">FinQA<span class="metric-label">Acc</span></th>
|
| 116 |
+
<th class="has-text-centered tooltip-trigger tooltip-right" data-title="ConvFinQA" data-tooltip="ConvFinQA is a multi-turn question answering dataset with 3,892 conversations containing 14,115 questions that explore chains of numerical reasoning in financial contexts. The conversational task requires maintaining context while performing sequential numerical operations to answer increasingly complex financial questions.">CFQA<span class="metric-label">Acc</span></th>
|
| 117 |
+
<th class="has-text-centered tooltip-trigger tooltip-right" data-title="TATQA" data-tooltip="TATQA is a large-scale question answering dataset for hybrid data sources that combines tables and text from financial reports. The task emphasizes numerical reasoning operations across multiple formats, requiring models to integrate information from structured and unstructured sources to answer financial questions.">TQA<span class="metric-label">Acc</span></th>
|
| 118 |
+
<th class="has-text-centered tooltip-trigger tooltip-right column-border-left" data-title="ECTSum" data-tooltip="ECTSum contains 2,425 document-summary pairs featuring earnings call transcripts paired with concise bullet-point summaries extracted from Reuters articles. The summarization task requires extracting and condensing key financial information from lengthy corporate communications into brief, informative points.">ECTSum<span class="metric-label">BERT</span></th>
|
| 119 |
+
<th class="has-text-centered tooltip-trigger tooltip-right" data-title="EDTSum" data-tooltip="EDTSum consists of 2,000 financial news articles paired with their headlines as ground-truth summaries for evaluating text summarization. The task challenges models to condense complex financial news articles into concise, informative headlines that capture the essential information.">EDTSum<span class="metric-label">BERT</span></th>
|
| 120 |
</tr>
|
| 121 |
</thead>
|
| 122 |
<tbody>
|
|
|
|
| 347 |
<thead>
|
| 348 |
<tr>
|
| 349 |
<th rowspan="2">Model</th>
|
| 350 |
+
<th colspan="4" class="has-text-centered tooltip-trigger" data-title="FinCausal (CD)" data-tooltip="FinCausal Causal Discovery (CD) contains 29,444 text sections from financial news, with 2,136 annotated as expressing causal relationships. The task involves extracting precise cause and effect spans from financial texts that contain causal relationships.">Causal Detection (CD)</th>
|
| 351 |
+
<th colspan="4" class="has-text-centered tooltip-trigger" data-title="FinCausal (CC)" data-tooltip="FinCausal Causality Classification (CC) consists of 29,444 text sections from financial news with binary annotations indicating causal relationships. The classification task requires determining whether a given financial text section contains a causal relationship (1) or not (0).">Causal Classification (CC)</th>
|
| 352 |
</tr>
|
| 353 |
<tr>
|
| 354 |
<th class="has-text-centered">Accuracy</th>
|
|
|
|
| 635 |
<thead>
|
| 636 |
<tr>
|
| 637 |
<th rowspan="2">Model</th>
|
| 638 |
+
<th colspan="4" class="has-text-centered tooltip-trigger" data-title="FiNER-ORD" data-tooltip="FiNER-ORD is a manually annotated named entity recognition dataset comprising financial news articles with detailed entity annotations. The task requires identifying and correctly classifying person, location, and organization entities in financial contexts.">FiNER-ORD</th>
|
| 639 |
+
<th colspan="4" class="has-text-centered tooltip-trigger" data-title="FinRED" data-tooltip="FinRED is a specialized relation extraction dataset created from financial news and earnings call transcripts using distance supervision based on Wikidata triplets. The task involves identifying and extracting financial relationships between entities to understand connections in financial contexts.">FinRED</th>
|
| 640 |
+
<th colspan="4" class="has-text-centered tooltip-trigger" data-title="FinEntity" data-tooltip="FinEntity consists of 979 financial news paragraphs containing 2,131 manually-annotated financial entities with sentiment classifications. The task involves identifying companies and asset classes in financial texts while determining the associated sentiment expressed toward each entity.">FinEntity</th>
|
| 641 |
+
<th colspan="4" class="has-text-centered tooltip-trigger" data-title="REFinD" data-tooltip="REFinD is a comprehensive relation extraction dataset containing approximately 29,000 annotated instances with 22 distinct relation types across 8 entity pair categories from various financial documents. The task requires identifying specific relationships between financial entities in complex documents like SEC filings.">REFinD</th>
|
| 642 |
</tr>
|
| 643 |
<tr>
|
| 644 |
<th class="has-text-centered">Accuracy</th>
|
|
|
|
| 1120 |
<th colspan="3" class="has-text-centered">Datasets (Accuracy)</th>
|
| 1121 |
</tr>
|
| 1122 |
<tr>
|
| 1123 |
+
<th class="has-text-centered tooltip-trigger" data-title="FinQA" data-tooltip="FinQA contains 8,281 question-answer pairs derived from financial reports that require numerical reasoning over tabular financial data. The question-answering task features multi-step reasoning challenges with full annotation of reasoning programs to solve complex financial queries.">FinQA</th>
|
| 1124 |
+
<th class="has-text-centered tooltip-trigger tooltip-right" data-title="ConvFinQA" data-tooltip="ConvFinQA is a multi-turn question answering dataset with 3,892 conversations containing 14,115 questions that explore chains of numerical reasoning in financial contexts. The conversational task requires maintaining context while performing sequential numerical operations to answer increasingly complex financial questions.">ConvFinQA</th>
|
| 1125 |
+
<th class="has-text-centered tooltip-trigger tooltip-right" data-title="TATQA" data-tooltip="TATQA is a large-scale question answering dataset for hybrid data sources that combines tables and text from financial reports. The task emphasizes numerical reasoning operations across multiple formats, requiring models to integrate information from structured and unstructured sources to answer financial questions.">TATQA</th>
|
| 1126 |
</tr>
|
| 1127 |
</thead>
|
| 1128 |
<tbody>
|
|
|
|
| 1284 |
<thead>
|
| 1285 |
<tr>
|
| 1286 |
<th rowspan="2">Model</th>
|
| 1287 |
+
<th colspan="3" class="has-text-centered tooltip-trigger" data-title="FiQA Task 1" data-tooltip="FiQA Task 1 focuses on aspect-based financial sentiment analysis in microblog posts and news headlines using a continuous scale from -1 (negative) to 1 (positive). The regression task requires models to accurately predict the sentiment score that reflects investor perception of financial texts.">FiQA Task 1</th>
|
| 1288 |
+
<th colspan="4" class="has-text-centered tooltip-trigger" data-title="Financial Phrase Bank" data-tooltip="Financial Phrase Bank (FPB) contains 4,840 sentences from financial news articles categorized as positive, negative, or neutral by 16 finance experts using majority voting. The sentiment classification task requires understanding how these statements might influence investor perception of stock prices.">Financial Phrase Bank (FPB)</th>
|
| 1289 |
+
<th colspan="4" class="has-text-centered tooltip-trigger" data-title="SubjECTive-QA" data-tooltip="SubjECTive-QA contains 49,446 annotations across 2,747 question-answer pairs extracted from 120 earnings call transcripts. The multi-label classification task involves analyzing six subjective features in financial discourse: assertiveness, cautiousness, optimism, specificity, clarity, and relevance.">SubjECTive-QA</th>
|
| 1290 |
</tr>
|
| 1291 |
<tr>
|
| 1292 |
<th class="has-text-centered">MSE</th>
|
|
|
|
| 1645 |
<thead>
|
| 1646 |
<tr>
|
| 1647 |
<th rowspan="2">Model</th>
|
| 1648 |
+
<th colspan="4" class="has-text-centered tooltip-trigger" data-title="Banking77" data-tooltip="Banking77 is a fine-grained dataset comprising 13,083 customer service queries annotated with 77 unique intents from the banking domain. The task involves accurately classifying each customer query into the correct intent category to improve automated banking support systems.">Banking77</th>
|
| 1649 |
+
<th colspan="4" class="has-text-centered tooltip-trigger" data-title="FinBench" data-tooltip="FinBench is a comprehensive evaluation dataset containing 333,000 labeled instances that combines tabular data and profile text for financial risk prediction. The task requires models to predict financial outcomes across three key risk categories: default, fraud, and customer churn.">FinBench</th>
|
| 1650 |
+
<th colspan="4" class="has-text-centered tooltip-trigger" data-title="FOMC" data-tooltip="FOMC is a dataset containing Federal Open Market Committee speeches, meeting minutes, and press conference transcripts spanning from 1996 to 2022. The classification task involves determining whether the monetary policy stance expressed in each document is hawkish (tighter monetary policy) or dovish (looser monetary policy).">FOMC</th>
|
| 1651 |
+
<th colspan="4" class="has-text-centered tooltip-trigger" data-title="NumClaim" data-tooltip="NumClaim is an expert-annotated dataset for detecting and analyzing fine-grained investor claims within financial narratives that contain numerical information. The task requires identifying and categorizing claims containing numerals in analyst reports and earnings call transcripts for investment decision making.">NumClaim</th>
|
| 1652 |
+
<th colspan="1" class="has-text-centered tooltip-trigger tooltip-right" data-title="Headlines" data-tooltip="Headlines is a dataset containing 11,412 human-annotated financial news headlines focused on commodities, particularly gold, spanning from 2000 to 2019. The classification task involves identifying binary indicators for price mentions and directional price movements in these concise financial texts.">Headlines</th>
|
| 1653 |
</tr>
|
| 1654 |
<tr>
|
| 1655 |
<th class="has-text-centered">Accuracy</th>
|
|
|
|
| 2152 |
<thead>
|
| 2153 |
<tr>
|
| 2154 |
<th rowspan="2">Model</th>
|
| 2155 |
+
<th colspan="3" class="has-text-centered tooltip-trigger" data-title="ECTSum" data-tooltip="ECTSum contains 2,425 document-summary pairs featuring earnings call transcripts paired with concise bullet-point summaries extracted from Reuters articles. The summarization task requires extracting and condensing key financial information from lengthy corporate communications into brief, informative points.">ECTSum</th>
|
| 2156 |
+
<th colspan="3" class="has-text-centered tooltip-trigger" data-title="EDTSum" data-tooltip="EDTSum consists of 2,000 financial news articles paired with their headlines as ground-truth summaries for evaluating text summarization. The task challenges models to condense complex financial news articles into concise, informative headlines that capture the essential information.">EDTSum</th>
|
| 2157 |
</tr>
|
| 2158 |
<tr>
|
| 2159 |
<th class="has-text-centered">BERTScore Precision</th>
|
sentiment_analysis_table.html
CHANGED
|
@@ -15,9 +15,9 @@
|
|
| 15 |
<thead>
|
| 16 |
<tr>
|
| 17 |
<th rowspan="2">Model</th>
|
| 18 |
-
<th colspan="3" class="has-text-centered tooltip-trigger" data-tooltip="FiQA Task 1 focuses on aspect-based financial sentiment analysis
|
| 19 |
-
<th colspan="4" class="has-text-centered tooltip-trigger" data-tooltip="Financial Phrase Bank contains 4,840 sentences from
|
| 20 |
-
<th colspan="4" class="has-text-centered tooltip-trigger tooltip-right" style="position: relative;" data-title="SubjECTive-QA" data-tooltip="
|
| 21 |
</tr>
|
| 22 |
<tr>
|
| 23 |
<th class="has-text-centered">MSE</th>
|
|
|
|
| 15 |
<thead>
|
| 16 |
<tr>
|
| 17 |
<th rowspan="2">Model</th>
|
| 18 |
+
<th colspan="3" class="has-text-centered tooltip-trigger" data-title="FiQA Task 1" data-tooltip="FiQA Task 1 focuses on aspect-based financial sentiment analysis in microblog posts and news headlines using a continuous scale from -1 (negative) to 1 (positive). The regression task requires models to accurately predict the sentiment score that reflects investor perception of financial texts.">FiQA Task 1</th>
|
| 19 |
+
<th colspan="4" class="has-text-centered tooltip-trigger" data-title="Financial Phrase Bank" data-tooltip="Financial Phrase Bank (FPB) contains 4,840 sentences from financial news articles categorized as positive, negative, or neutral by 16 finance experts using majority voting. The sentiment classification task requires understanding how these statements might influence investor perception of stock prices.">Financial Phrase Bank (FPB)</th>
|
| 20 |
+
<th colspan="4" class="has-text-centered tooltip-trigger tooltip-right" style="position: relative;" data-title="SubjECTive-QA" data-tooltip="SubjECTive-QA contains 49,446 annotations across 2,747 question-answer pairs extracted from 120 earnings call transcripts. The multi-label classification task involves analyzing six subjective features in financial discourse: assertiveness, cautiousness, optimism, specificity, clarity, and relevance.">SubjECTive-QA</th>
|
| 21 |
</tr>
|
| 22 |
<tr>
|
| 23 |
<th class="has-text-centered">MSE</th>
|
static/css/results.css
CHANGED
|
@@ -152,6 +152,11 @@ body {
|
|
| 152 |
/* min-width: 110px; */
|
| 153 |
}
|
| 154 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 155 |
/* Adjust column widths for task groups */
|
| 156 |
.results-table th[colspan="4"],
|
| 157 |
.results-table th[colspan="3"],
|
|
@@ -238,4 +243,4 @@ body {
|
|
| 238 |
.results-table td:first-child {
|
| 239 |
word-break: break-word;
|
| 240 |
hyphens: auto;
|
| 241 |
-
}
|
|
|
|
| 152 |
/* min-width: 110px; */
|
| 153 |
}
|
| 154 |
|
| 155 |
+
/* Ensure all data cells are center-aligned */
|
| 156 |
+
.results-table td:not(:first-child) {
|
| 157 |
+
text-align: center !important;
|
| 158 |
+
}
|
| 159 |
+
|
| 160 |
/* Adjust column widths for task groups */
|
| 161 |
.results-table th[colspan="4"],
|
| 162 |
.results-table th[colspan="3"],
|
|
|
|
| 243 |
.results-table td:first-child {
|
| 244 |
word-break: break-word;
|
| 245 |
hyphens: auto;
|
| 246 |
+
}
|
text_classification_table.html
CHANGED
|
@@ -15,11 +15,11 @@
|
|
| 15 |
<thead>
|
| 16 |
<tr>
|
| 17 |
<th rowspan="2">Model</th>
|
| 18 |
-
<th colspan="4" class="has-text-centered tooltip-trigger" data-tooltip="
|
| 19 |
-
<th colspan="4" class="has-text-centered tooltip-trigger" data-tooltip="
|
| 20 |
-
<th colspan="4" class="has-text-centered tooltip-trigger" data-tooltip="
|
| 21 |
-
<th colspan="4" class="has-text-centered tooltip-trigger tooltip-right" style="position: relative;" data-title="NumClaim" data-tooltip="
|
| 22 |
-
<th colspan="1" class="has-text-centered tooltip-trigger tooltip-right" data-tooltip="
|
| 23 |
</tr>
|
| 24 |
<tr>
|
| 25 |
<th class="has-text-centered">Accuracy</th>
|
|
|
|
| 15 |
<thead>
|
| 16 |
<tr>
|
| 17 |
<th rowspan="2">Model</th>
|
| 18 |
+
<th colspan="4" class="has-text-centered tooltip-trigger" data-title="Banking77" data-tooltip="Banking77 is a fine-grained dataset comprising 13,083 customer service queries annotated with 77 unique intents from the banking domain. The task involves accurately classifying each customer query into the correct intent category to improve automated banking support systems.">Banking77</th>
|
| 19 |
+
<th colspan="4" class="has-text-centered tooltip-trigger" data-title="FinBench" data-tooltip="FinBench is a comprehensive evaluation dataset containing 333,000 labeled instances that combines tabular data and profile text for financial risk prediction. The task requires models to predict financial outcomes across three key risk categories: default, fraud, and customer churn.">FinBench</th>
|
| 20 |
+
<th colspan="4" class="has-text-centered tooltip-trigger" data-title="FOMC" data-tooltip="FOMC is a dataset containing Federal Open Market Committee speeches, meeting minutes, and press conference transcripts spanning from 1996 to 2022. The classification task involves determining whether the monetary policy stance expressed in each document is hawkish (tighter monetary policy) or dovish (looser monetary policy).">FOMC</th>
|
| 21 |
+
<th colspan="4" class="has-text-centered tooltip-trigger tooltip-right" style="position: relative;" data-title="NumClaim" data-tooltip="NumClaim is an expert-annotated dataset for detecting and analyzing fine-grained investor claims within financial narratives that contain numerical information. The task requires identifying and categorizing claims containing numerals in analyst reports and earnings call transcripts for investment decision making.">NumClaim</th>
|
| 22 |
+
<th colspan="1" class="has-text-centered tooltip-trigger tooltip-right" data-title="Headlines" data-tooltip="Headlines is a dataset containing 11,412 human-annotated financial news headlines focused on commodities, particularly gold, spanning from 2000 to 2019. The classification task involves identifying binary indicators for price mentions and directional price movements in these concise financial texts.">Headlines</th>
|
| 23 |
</tr>
|
| 24 |
<tr>
|
| 25 |
<th class="has-text-centered">Accuracy</th>
|
text_summarization_table.html
CHANGED
|
@@ -15,8 +15,8 @@
|
|
| 15 |
<thead>
|
| 16 |
<tr>
|
| 17 |
<th rowspan="2">Model</th>
|
| 18 |
-
<th colspan="3" class="has-text-centered tooltip-trigger" data-
|
| 19 |
-
<th colspan="3" class="has-text-centered tooltip-trigger tooltip-right" style="position: relative;" data-title="EDTSum" data-tooltip="
|
| 20 |
</tr>
|
| 21 |
<tr>
|
| 22 |
<th class="has-text-centered">BERTScore Precision</th>
|
|
|
|
| 15 |
<thead>
|
| 16 |
<tr>
|
| 17 |
<th rowspan="2">Model</th>
|
| 18 |
+
<th colspan="3" class="has-text-centered tooltip-trigger" data-title="ECTSum" data-tooltip="ECTSum contains 2,425 document-summary pairs featuring earnings call transcripts paired with concise bullet-point summaries extracted from Reuters articles. The summarization task requires extracting and condensing key financial information from lengthy corporate communications into brief, informative points.">ECTSum</th>
|
| 19 |
+
<th colspan="3" class="has-text-centered tooltip-trigger tooltip-right" style="position: relative;" data-title="EDTSum" data-tooltip="EDTSum consists of 2,000 financial news articles paired with their headlines as ground-truth summaries for evaluating text summarization. The task challenges models to condense complex financial news articles into concise, informative headlines that capture the essential information.">EDTSum</th>
|
| 20 |
</tr>
|
| 21 |
<tr>
|
| 22 |
<th class="has-text-centered">BERTScore Precision</th>
|