# Data Directory This directory holds the preference datasets used by PreferenceLab. On first run, if these files are absent, the environment falls back to built-in synthetic examples (defined in `server/environment.py`). ## File Format ### pairwise_data.json ```json [ { "prompt": "...", "response_a": "...", "response_b": "...", "gold_label": "A", "source": "hh-rlhf" } ] ``` ### likert_data.json ```json [ { "prompt": "...", "response": "...", "rubric": "...", "gold_scores": { "helpfulness": 4, "honesty": 5, "harmlessness": 5, "instruction_following": 4 }, "source": "ultrafeedback" } ] ``` ### consistency_data.json ```json [ { "prompt": "...", "response_a": "...", "response_b": "...", "response_c": "...", "response_d": "...", "gold_ranking": ["C", "A", "B", "D"], "source": "stanford-shp" } ] ``` ## Loading Real Datasets Run `python scripts/prepare_datasets.py` to download and convert HH-RLHF, UltraFeedback, and Stanford SHP into these formats.