Mahadevan PV commited on
Commit
45fed78
·
1 Parent(s): 9e51403

added training sets

Browse files
Files changed (1) hide show
  1. README.md +0 -89
README.md DELETED
@@ -1,89 +0,0 @@
1
- ---
2
- annotations_creators:
3
- - crowdsourced
4
- language:
5
- - en
6
- language_creators:
7
- - crowdsourced
8
- - machine-generated
9
- license: cc-by-4.0
10
- multilinguality:
11
- - monolingual
12
- pretty_name: ProsocialDialog
13
- size_categories:
14
- - 10K<n<100K
15
- - 100K<n<1M
16
- source_datasets:
17
- - original
18
- - extended|social_bias_frames
19
- tags:
20
- - dialogue
21
- - dialogue safety
22
- - social norm
23
- - rules-of-thumb
24
- task_categories:
25
- - conversational
26
- - text-classification
27
- task_ids:
28
- - dialogue-generation
29
- - multi-class-classification
30
- ---
31
-
32
- # Dataset Card for ProsocialDialog Dataset
33
-
34
- ## Dataset Description
35
- - **Repository:** [Dataset and Model](https://github.com/skywalker023/prosocial-dialog)
36
- - **Paper:** [ProsocialDialog: A Prosocial Backbone for Conversational Agents](https://aclanthology.org/2022.emnlp-main.267/)
37
- - **Point of Contact:** [Hyunwoo Kim](mailto:hyunwook@allenai.org)
38
-
39
- ## Dataset Summary
40
- ProsocialDialog is the first large-scale multi-turn English dialogue dataset to teach conversational agents to respond to problematic content following social norms. Covering diverse unethical, problematic, biased, and toxic situations, ProsocialDialog contains responses that encourage prosocial behavior, grounded in commonsense social rules (i.e., rules-of-thumb, RoTs). Created via a human-AI collaborative framework, ProsocialDialog consists of 58K dialogues, with 331K utterances, 160K unique RoTs, and 497K dialogue safety labels accompanied by free-form rationales.
41
-
42
-
43
- ## Supported Tasks
44
- * Dialogue response generation
45
- * Dialogue safety prediction
46
- * Rules-of-thumb generation
47
-
48
- ## Languages
49
- English
50
-
51
- ## Dataset Structure
52
-
53
- ### Data Attributes
54
- attribute | type | description
55
- --- | --- | ---
56
- `context` | str | the potentially unsafe utterance
57
- `response` | str | the guiding utterance grounded on rules-of-thumb (`rots`)
58
- `rots` | list of str\|null | the relevant rules-of-thumb for `text` *not* labeled as \_\_casual\_\_
59
- `safety_label` | str | the final verdict of the context according to `safety_annotations`: {\_\_casual\_\_, \_\_possibly\_needs\_caution\_\_, \_\_probably\_needs\_caution\_\_, \_\_needs\_caution\_\_, \_\_needs\_intervention\_\_}
60
- `safety_annotations` | list of str | raw annotations from three workers: {casual, needs caution, needs intervention}
61
- `safety_annotation_reasons` | list of str | the reasons behind the safety annotations in free-form text from each worker
62
- `source` | str | the source of the seed text that was used to craft the first utterance of the dialogue: {socialchemistry, sbic, ethics_amt, ethics_reddit}
63
- `etc` | str\|null | other information
64
- `dialogue_id` | int | the dialogue index
65
- `response_id` | int | the response index
66
- `episode_done` | bool | an indicator of whether it is the end of the dialogue
67
-
68
-
69
- ## Dataset Creation
70
-
71
- To create ProsocialDialog, we set up a human-AI collaborative data creation framework, where GPT-3 generates the potentially unsafe utterances, and crowdworkers provide prosocial responses to them. This approach allows us to circumvent two substantial challenges: (1) there are no available large-scale corpora of multiturn prosocial conversations between humans, and (2) asking humans to write unethical, toxic, or problematic utterances could result in psychological harms (Roberts, 2017; Steiger et al., 2021).
72
-
73
- ### Further Details, Social Impacts, and Limitations
74
- Please refer to our [paper](https://arxiv.org/abs/2205.12688).
75
-
76
-
77
- ## Additional Information
78
-
79
- ### Citation
80
-
81
- Please cite our work if you found the resources in this repository useful:
82
- ```
83
- @inproceedings{kim2022prosocialdialog,
84
- title={ProsocialDialog: A Prosocial Backbone for Conversational Agents},
85
- author={Hyunwoo Kim and Youngjae Yu and Liwei Jiang and Ximing Lu and Daniel Khashabi and Gunhee Kim and Yejin Choi and Maarten Sap},
86
- booktitle={EMNLP},
87
- year=2022
88
- }
89
- ```