Spaces:

Brand24
/

mms_benchmark

Runtime error

wscode commited on Jun 14, 2023

Commit

b802e46

1 Parent(s): 25586d7

add description to dataset statistics page

Files changed (2) hide show

pages/1_Language_Statistics.py CHANGED Viewed

@@ -14,7 +14,7 @@ st.set_page_config(page_title="Language Statistics", page_icon="📈")
 st.markdown("# Language Statistics")
 st.write("""\
 The table below shows the per-language statistics of the MMS corpus.
-You can use the **'Add filters'** button to filter the table by any of the columns.
 Column descriptions:
 - **Language**: Language name,

 st.markdown("# Language Statistics")
 st.write("""\
 The table below shows the per-language statistics of the MMS corpus.
+You can use the **'Add filters'** checkbox to filter the table by any of the columns.
 Column descriptions:
 - **Language**: Language name,

pages/{2_Dataset_Statistics.py → 2_Dataset_Statistics_&_citation_export.py} RENAMED Viewed

@@ -7,6 +7,8 @@ from filter_dataframe import filter_dataframe
 def get_language_stats_df():
     return pd.read_parquet("data/datasets_stats.parquet")
 _MMS_CITATION = """\
 @misc{augustyniak2023massively,
       title={Massively Multilingual Corpus of Sentiment Datasets and Multi-faceted Sentiment Classification Benchmark},
@@ -30,9 +32,26 @@ def export_citations(df: pd.DataFrame):
     return f"% MMS corpus citation\n{_MMS_CITATION}\n{CITATION_SEPARATOR}{dataset_citations_joined}"
-st.set_page_config(page_title="Dataset statistics", page_icon="📈")
-st.markdown("# Dataset statistics")
 df = get_language_stats_df()

 def get_language_stats_df():
     return pd.read_parquet("data/datasets_stats.parquet")
+TITLE = "Dataset statistics & citation export"
 _MMS_CITATION = """\
 @misc{augustyniak2023massively,
       title={Massively Multilingual Corpus of Sentiment Datasets and Multi-faceted Sentiment Classification Benchmark},
     return f"% MMS corpus citation\n{_MMS_CITATION}\n{CITATION_SEPARATOR}{dataset_citations_joined}"
+st.set_page_config(page_title=TITLE, page_icon="📈")
+st.markdown(f"# {TITLE}")
+st.write("""\
+The table below shows the per-language statistics of the MMS corpus.
+You can use the **'Add filters'** checkbox to filter the table by any of the columns.
+You can also use the 'Export citations' button to export the citations for the datasets in the filtered table.
+Column descriptions:
+- **original_dataset**: Original dataset name as used in the MMS corpus,
+- **language**: 2-letter language code,
+- **domain**: Domain of the dataset,
+- **mean_chars**: The average number of characters in a single example,
+- **mean_words**: The average number of words in a single example,
+- **examples_sum**: The total number of examples in the dataset,
+- **NEG**: Number of examples with negative sentiment,
+- **NEU**: Number of examples with neutral sentiment,
+- **POS**: Number of examples with positive sentiment,
+- **paper**: Link to the paper in which the dataset was originally published,
+- **citation**: Citation for the dataset,""")
 df = get_language_stats_df()