Spaces:

Dev-CrafterX
/

preference-lab

Sleeping

Sibam

PreferenceLab OpenEnv environment for RLHF preference simulation

cdf485e 3 months ago

1.07 kB

	# Data Directory

	This directory holds the preference datasets used by PreferenceLab.

	On first run, if these files are absent, the environment falls back to
	built-in synthetic examples (defined in `server/environment.py`).

	## File Format

	### pairwise_data.json
	```json
	[
	{
	"prompt": "...",
	"response_a": "...",
	"response_b": "...",
	"gold_label": "A",
	"source": "hh-rlhf"
	}
	]
	```

	### likert_data.json
	```json
	[
	{
	"prompt": "...",
	"response": "...",
	"rubric": "...",
	"gold_scores": {
	"helpfulness": 4,
	"honesty": 5,
	"harmlessness": 5,
	"instruction_following": 4
	},
	"source": "ultrafeedback"
	}
	]
	```

	### consistency_data.json
	```json
	[
	{
	"prompt": "...",
	"response_a": "...",
	"response_b": "...",
	"response_c": "...",
	"response_d": "...",
	"gold_ranking": ["C", "A", "B", "D"],
	"source": "stanford-shp"
	}
	]
	```

	## Loading Real Datasets

	Run `python scripts/prepare_datasets.py` to download and convert
	HH-RLHF, UltraFeedback, and Stanford SHP into these formats.