Spaces:

roc-hci
/

Turing-Bench-Leaderboard

Running

roc-hci commited on 14 days ago

Commit

3d3b8ba

verified ·

1 Parent(s): 0b74f4a

Update about.py

Files changed (1) hide show

about.py CHANGED Viewed

@@ -2,20 +2,22 @@ TITLE = "# Turing Bench"
 INTRODUCTION_TEXT = """
 ### Welcome to the Turing Bench Leaderboard.
-This is a benchmark evaluating a model's ability to
-recognize text that is produced by humans, compared to text that is produced by AI.
 """
 DESCRIPTION_TEXT = """
-        ## Dataset: Turing Test Judge Benchmark
-        From a paired dialogue (Human-Human vs Human-AI), the task is to predict which dialogue is **human-human**
-        ### What this dataset is
-        A collection of **paired dialogue examples**. Each example contains two full dialogue transcripts (**A** and **B**):
         - One transcript is **human-human** (two humans talking).
         - The other transcript is **human-AI** (a human talking to an AI).
-        ### Task
         **Binary classification (A/B):** Given `dialogueA` and `dialogueB`, predict which one is the **human-human** dialogue.
         - **Input:** `dialogueA` (string), `dialogueB` (string)
@@ -25,7 +27,8 @@ DESCRIPTION_TEXT = """
         Each row (one example) includes:
         - `id` (int): pair ID number
         - `dialogueA` (string): transcript A
-        - `dialogueB` (string): transcript B"""
 CITATION_BUTTON_TEXT = """
 If you use this benchmark, please cite:

 INTRODUCTION_TEXT = """
 ### Welcome to the Turing Bench Leaderboard.
+This is a benchmark evaluating a model's ability to recognize humans from AI in conversations.
+"""
+LEADERBOARD_DETAIL_TEXT = """
+Submissions are ordered by accuracy (by default). Refresh to fetch the latest evaluated runs.
 """
 DESCRIPTION_TEXT = """
+        The Turing Benchmark evaluates a model's ability to differentiate human-human conversations from human-AI conversations.
+        ### We provide the dataset
+        A collection of **paired dialogues**. Each example contains two full dialogue transcripts (**A** and **B**):
         - One transcript is **human-human** (two humans talking).
         - The other transcript is **human-AI** (a human talking to an AI).
+        ### Your model's task
         **Binary classification (A/B):** Given `dialogueA` and `dialogueB`, predict which one is the **human-human** dialogue.
         - **Input:** `dialogueA` (string), `dialogueB` (string)
         Each row (one example) includes:
         - `id` (int): pair ID number
         - `dialogueA` (string): transcript A
+        - `dialogueB` (string): transcript B
+        """
 CITATION_BUTTON_TEXT = """
 If you use this benchmark, please cite: