English
Chinese

WildFB Dataset

WildFB (Wild Feedback) is a high-quality dataset of 186k instances filtered and refined from WildChat-4.8M. Each instance is labeled with a 4-level ordinal satisfaction score extracted from in-the-wild human-LLM interactions.

Dataset Details

WildFB addresses the challenge of training reward models without expensive human-annotated preference pairs. Instead, it extracts implicit reward signals from user follow-up queries in real-world conversations.

Label Distribution

The dataset uses a 4-point ordinal scale based on user satisfaction:

Label Level Description
1 CLEARLY NEGATIVE User expresses rejection, strong dissatisfaction, or abandonment
2 CORRECTION User provides error corrections or points out mistakes
3 POSITIVE ENGAGEMENT User continues conversation with positive engagement
4 CLEAR SATISFACTION User expresses thanks, praise, or clear satisfaction

Dataset Statistics

  • Total Instances: 186,000+
  • Train Split: ~181,000
  • Test Split: 5,000
  • Source: WildChat-4.8M (filtered and refined)
  • Languages: Primarily English, with multilingual support

Data Generation Pipeline

WildFB is constructed through an automated 8-step pipeline:

  1. Preprocessing - Convert WildChat parquet files to JSONL format
  2. Prompt Generation - Generate preference classification prompts
  3. Response Generation - Generate classification responses using LLM API
  4. Filtering & Parsing - Extract and validate user feedback labels
  5. Conversation Merging - Reconstruct full conversation contexts
  6. Hindsight Mining - Recover hidden positive signals from neutral-looking contexts
  7. Refusal Validation - Filter noise where users penalize correct safety refusals
  8. Train/Test Split - Create 5000-sample test set

Key Features

  • Implicit Feedback Mining - Recovers positive signals from contexts that appear neutral but indicate satisfaction
  • Refusal Validation - Removes noise where users unjustifiably penalize correct safety refusals by the model
  • Topic-Aware Filtering - Ensures diverse coverage across different conversation topics

Use Cases

WildFB is primarily designed for:

  1. Reward Model Training - Train ordinal regression models via CORAL-like approach
  2. Quality Assessment - Benchmark for conversation quality evaluation

Dataset Structure

{
  "id": "uuid",
  "history": [
    {"role": "user", "content": "..."},
    {"role": "assistant", "content": "..."},
    ...
  ],
  "text": "Full conversation text...",
  "messages": [
    {"role": "user", "content": "..."},
    {"role": "assistant", "content": "..."}
  ],
  "user_feedback": "thank you!",
  "label": 4
}

Usage Example

from datasets import load_dataset

# Load the dataset
dataset = load_dataset("THU-KEG/WildFB")

# Access training data
train_data = dataset["train"]

# Example instance
instance = train_data[0]
print(f"Label: {instance['label']} (1-4)")
print(f"User Feedback: {instance['user_feedback']}")
print(f"Messages: {instance['messages']}")

Source Data

WildFB is adapted from the WildChat-4.8M dataset, which contains millions of real-world human-LLM conversations collected from the WildChat platform.

Data Collection & Processing

For detailed information on the data collection pipeline and filtering methodology, please refer to:

📚 WildReward GitHub Repository

The repository contains:

  • Complete pipeline implementation (collect_rm_data/)
  • Detailed documentation for each processing step
  • Quality control and filtering strategies

License

This dataset is released under the MIT License. The original WildChat dataset may have its own license terms that users should comply with.

Citation


Acknowledgments

  • WildChat dataset for providing the raw conversation data
  • The WildReward project for the data processing pipeline

Note: This is a filtered and processed version of WildChat-4.8M. Please refer to the WildReward GitHub repository for complete pipeline details and methodology.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including THU-KEG/WildFB