English
Chinese
WildFB / README.md
Wesleythu's picture
Update README.md
0777014 verified
---
license: mit
task_categories:
- text-classification
- reinforcement-learning
language:
- en
- zh
multilinguality:
- multilingual
size_categories:
- 100K<n<1M
source_datasets:
- WildChat-4.8M
pretty_name: WildFB
dataset_info:
features:
- name: id
dtype: string
- name: history
dtype: string
- name: text
dtype: string
- name: messages
dtype: list
- name: user_feedback
dtype: string
- name: label
dtype: int
splits:
train:
num_examples: ~181000
test:
num_examples: 5000
---
# WildFB Dataset
**WildFB** (Wild Feedback) is a high-quality dataset of **186k instances** filtered and refined from [WildChat-4.8M](https://huggingface.co/datasets/allenai/WildChat-4.8M). Each instance is labeled with a **4-level ordinal satisfaction score** extracted from in-the-wild human-LLM interactions.
## Dataset Details
WildFB addresses the challenge of training reward models without expensive human-annotated preference pairs. Instead, it extracts **implicit reward signals** from user follow-up queries in real-world conversations.
### Label Distribution
The dataset uses a 4-point ordinal scale based on user satisfaction:
| Label | Level | Description |
|-------|-------|-------------|
| 1 | CLEARLY NEGATIVE | User expresses rejection, strong dissatisfaction, or abandonment |
| 2 | CORRECTION | User provides error corrections or points out mistakes |
| 3 | POSITIVE ENGAGEMENT | User continues conversation with positive engagement |
| 4 | CLEAR SATISFACTION | User expresses thanks, praise, or clear satisfaction |
### Dataset Statistics
- **Total Instances:** 186,000+
- **Train Split:** ~181,000
- **Test Split:** 5,000
- **Source:** WildChat-4.8M (filtered and refined)
- **Languages:** Primarily English, with multilingual support
## Data Generation Pipeline
WildFB is constructed through an **automated 8-step pipeline**:
1. **Preprocessing** - Convert WildChat parquet files to JSONL format
2. **Prompt Generation** - Generate preference classification prompts
3. **Response Generation** - Generate classification responses using LLM API
4. **Filtering & Parsing** - Extract and validate user feedback labels
5. **Conversation Merging** - Reconstruct full conversation contexts
6. **Hindsight Mining** - Recover hidden positive signals from neutral-looking contexts
7. **Refusal Validation** - Filter noise where users penalize correct safety refusals
8. **Train/Test Split** - Create 5000-sample test set
### Key Features
- **Implicit Feedback Mining** - Recovers positive signals from contexts that appear neutral but indicate satisfaction
- **Refusal Validation** - Removes noise where users unjustifiably penalize correct safety refusals by the model
- **Topic-Aware Filtering** - Ensures diverse coverage across different conversation topics
## Use Cases
WildFB is primarily designed for:
1. **Reward Model Training** - Train ordinal regression models via CORAL-like approach
2. **Quality Assessment** - Benchmark for conversation quality evaluation
## Dataset Structure
```json
{
"id": "uuid",
"history": [
{"role": "user", "content": "..."},
{"role": "assistant", "content": "..."},
...
],
"text": "Full conversation text...",
"messages": [
{"role": "user", "content": "..."},
{"role": "assistant", "content": "..."}
],
"user_feedback": "thank you!",
"label": 4
}
```
## Usage Example
```python
from datasets import load_dataset
# Load the dataset
dataset = load_dataset("THU-KEG/WildFB")
# Access training data
train_data = dataset["train"]
# Example instance
instance = train_data[0]
print(f"Label: {instance['label']} (1-4)")
print(f"User Feedback: {instance['user_feedback']}")
print(f"Messages: {instance['messages']}")
```
## Source Data
WildFB is adapted from the [WildChat-4.8M](https://huggingface.co/datasets/allenai/WildChat-4.8M) dataset, which contains millions of real-world human-LLM conversations collected from the WildChat platform.
## Data Collection & Processing
For detailed information on the data collection pipeline and filtering methodology, please refer to:
📚 **[WildReward GitHub Repository](https://github.com/THU-KEG/WildReward)**
The repository contains:
- Complete pipeline implementation (`collect_rm_data/`)
- Detailed documentation for each processing step
- Quality control and filtering strategies
## License
This dataset is released under the **MIT License**. The original WildChat dataset may have its own license terms that users should comply with.
## Citation
```bibtex
```
## Acknowledgments
- WildChat dataset for providing the raw conversation data
- The WildReward project for the data processing pipeline
---
**Note:** This is a filtered and processed version of WildChat-4.8M. Please refer to the [WildReward GitHub repository](https://github.com/THU-KEG/WildReward) for complete pipeline details and methodology.