2.26 MB
6 files
Updated 7 days ago
Name
Size
dpo
sft
.gitattributes2.5 kB
xet
README.md2.06 kB
xet
README.md

AskBeforeAnswer Dataset

This dataset contains the training and validation splits for the AskBeforeAnswer clarification-seeking model.

GitHub Release: v0.0.4

Subsets (Configurations)

This repository contains two subsets which must be loaded separately depending on the training stage:

1. sft (Supervised Fine-Tuning)

Contains the structured JSON responses for initial alignment.

  • Features: instruction, input, output (JSON dict containing action, reasoning, facets, response)
from datasets import load_dataset
sft_dataset = load_dataset("chrisjcc/ask-before-answer-data", "sft")

2. dpo (Direct Preference Optimization)

Contains the preference pairs used to penalize hallucinations.

  • Features: prompt, chosen, rejected
from datasets import load_dataset
dpo_dataset = load_dataset("chrisjcc/ask-before-answer-data", "dpo")
Total size
2.26 MB
Files
6
Last updated
Jun 21
Pre-warmed CDN
US EU US EU

Contributors