FasterDFlash
/

Hanrui

Model card Files Files and versions

Hanrui / datasets /mtbench /README.md

Lekr0's picture

Add files using upload-large-folder tool

212a146 verified 29 days ago

|

history blame contribute delete

2 kB

	---
	dataset_info:
	features:
	- name: question_id
	dtype: int64
	- name: model_a
	dtype: string
	- name: model_b
	dtype: string
	- name: winner
	dtype: string
	- name: judge
	dtype: string
	- name: conversation_a
	list:
	- name: content
	dtype: string
	- name: role
	dtype: string
	- name: conversation_b
	list:
	- name: content
	dtype: string
	- name: role
	dtype: string
	- name: turn
	dtype: int64
	splits:
	- name: human
	num_bytes: 15003469
	num_examples: 3355
	- name: gpt4_pair
	num_bytes: 10679650
	num_examples: 2400
	download_size: 1388888
	dataset_size: 25683119
	license: cc-by-4.0
	task_categories:
	- conversational
	- question-answering
	language:
	- en
	size_categories:
	- 1K<n<10K
	---

	## Content
	This dataset contains 3.3K expert-level pairwise human preferences for model responses generated by 6 models in response to 80 MT-bench questions.
	The 6 models are GPT-4, GPT-3.5, Claud-v1, Vicuna-13B, Alpaca-13B, and LLaMA-13B. The annotators are mostly graduate students with expertise in the topic areas of each of the questions. The details of data collection can be found in our [paper](https://arxiv.org/abs/2306.05685).

	## Agreement Calculation
	This Colab [notebook](https://colab.research.google.com/drive/1ctgygDRJhVGUJTQy8-bRZCl1WNcT8De6?usp=sharing) shows how to compute the agreement between humans and GPT-4 judge with the dataset. Our results show that humans and GPT-4 judge achieve over 80\% agreement, the same level of agreement between humans.

	## Citation
	```
	@misc{zheng2023judging,
	title={Judging LLM-as-a-judge with MT-Bench and Chatbot Arena},
	author={Lianmin Zheng and Wei-Lin Chiang and Ying Sheng and Siyuan Zhuang and Zhanghao Wu and Yonghao Zhuang and Zi Lin and Zhuohan Li and Dacheng Li and Eric. P Xing and Hao Zhang and Joseph E. Gonzalez and Ion Stoica},
	year={2023},
	eprint={2306.05685},
	archivePrefix={arXiv},
	primaryClass={cs.CL}
	}
	```