# Data Directory

This directory holds the preference datasets used by PreferenceLab.

On first run, if these files are absent, the environment falls back to
built-in synthetic examples (defined in `server/environment.py`).

## File Format

### pairwise_data.json
```json
[
  {
    "prompt": "...",
    "response_a": "...",
    "response_b": "...",
    "gold_label": "A",
    "source": "hh-rlhf"
  }
]
```

### likert_data.json
```json
[
  {
    "prompt": "...",
    "response": "...",
    "rubric": "...",
    "gold_scores": {
      "helpfulness": 4,
      "honesty": 5,
      "harmlessness": 5,
      "instruction_following": 4
    },
    "source": "ultrafeedback"
  }
]
```

### consistency_data.json
```json
[
  {
    "prompt": "...",
    "response_a": "...",
    "response_b": "...",
    "response_c": "...",
    "response_d": "...",
    "gold_ranking": ["C", "A", "B", "D"],
    "source": "stanford-shp"
  }
]
```

## Loading Real Datasets

Run `python scripts/prepare_datasets.py` to download and convert
HH-RLHF, UltraFeedback, and Stanford SHP into these formats.