Update about.py
Browse files
about.py
CHANGED
|
@@ -2,20 +2,22 @@ TITLE = "# Turing Bench"
|
|
| 2 |
|
| 3 |
INTRODUCTION_TEXT = """
|
| 4 |
### Welcome to the Turing Bench Leaderboard.
|
|
|
|
|
|
|
| 5 |
|
| 6 |
-
|
| 7 |
-
|
| 8 |
"""
|
|
|
|
| 9 |
DESCRIPTION_TEXT = """
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
A collection of **paired dialogue examples**. Each example contains two full dialogue transcripts (**A** and **B**):
|
| 15 |
- One transcript is **human-human** (two humans talking).
|
| 16 |
- The other transcript is **human-AI** (a human talking to an AI).
|
| 17 |
|
| 18 |
-
###
|
| 19 |
**Binary classification (A/B):** Given `dialogueA` and `dialogueB`, predict which one is the **human-human** dialogue.
|
| 20 |
|
| 21 |
- **Input:** `dialogueA` (string), `dialogueB` (string)
|
|
@@ -25,7 +27,8 @@ DESCRIPTION_TEXT = """
|
|
| 25 |
Each row (one example) includes:
|
| 26 |
- `id` (int): pair ID number
|
| 27 |
- `dialogueA` (string): transcript A
|
| 28 |
-
- `dialogueB` (string): transcript B
|
|
|
|
| 29 |
|
| 30 |
CITATION_BUTTON_TEXT = """
|
| 31 |
If you use this benchmark, please cite:
|
|
|
|
| 2 |
|
| 3 |
INTRODUCTION_TEXT = """
|
| 4 |
### Welcome to the Turing Bench Leaderboard.
|
| 5 |
+
This is a benchmark evaluating a model's ability to recognize humans from AI in conversations.
|
| 6 |
+
"""
|
| 7 |
|
| 8 |
+
LEADERBOARD_DETAIL_TEXT = """
|
| 9 |
+
Submissions are ordered by accuracy (by default). Refresh to fetch the latest evaluated runs.
|
| 10 |
"""
|
| 11 |
+
|
| 12 |
DESCRIPTION_TEXT = """
|
| 13 |
+
The Turing Benchmark evaluates a model's ability to differentiate human-human conversations from human-AI conversations.
|
| 14 |
+
|
| 15 |
+
### We provide the dataset
|
| 16 |
+
A collection of **paired dialogues**. Each example contains two full dialogue transcripts (**A** and **B**):
|
|
|
|
| 17 |
- One transcript is **human-human** (two humans talking).
|
| 18 |
- The other transcript is **human-AI** (a human talking to an AI).
|
| 19 |
|
| 20 |
+
### Your model's task
|
| 21 |
**Binary classification (A/B):** Given `dialogueA` and `dialogueB`, predict which one is the **human-human** dialogue.
|
| 22 |
|
| 23 |
- **Input:** `dialogueA` (string), `dialogueB` (string)
|
|
|
|
| 27 |
Each row (one example) includes:
|
| 28 |
- `id` (int): pair ID number
|
| 29 |
- `dialogueA` (string): transcript A
|
| 30 |
+
- `dialogueB` (string): transcript B
|
| 31 |
+
"""
|
| 32 |
|
| 33 |
CITATION_BUTTON_TEXT = """
|
| 34 |
If you use this benchmark, please cite:
|