Text Classification
Transformers
Safetensors
qwen3
reward-model
rlhf
dpo
alignment
wildchat
text-embeddings-inference
Instructions to use THU-KEG/WildReward-8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use THU-KEG/WildReward-8B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="THU-KEG/WildReward-8B")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("THU-KEG/WildReward-8B") model = AutoModelForSequenceClassification.from_pretrained("THU-KEG/WildReward-8B") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -109,6 +109,15 @@ WildReward achieves competitive results on standard reward model benchmarks whil
|
|
| 109 |
## Citation
|
| 110 |
|
| 111 |
```bibtex
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 112 |
```
|
| 113 |
|
| 114 |
## License
|
|
|
|
| 109 |
## Citation
|
| 110 |
|
| 111 |
```bibtex
|
| 112 |
+
@misc{peng2026wildrewardlearningrewardmodels,
|
| 113 |
+
title={WildReward: Learning Reward Models from In-the-Wild Human Interactions},
|
| 114 |
+
author={Hao Peng and Yunjia Qi and Xiaozhi Wang and Zijun Yao and Lei Hou and Juanzi Li},
|
| 115 |
+
year={2026},
|
| 116 |
+
eprint={2602.08829},
|
| 117 |
+
archivePrefix={arXiv},
|
| 118 |
+
primaryClass={cs.CL},
|
| 119 |
+
url={https://arxiv.org/abs/2602.08829},
|
| 120 |
+
}
|
| 121 |
```
|
| 122 |
|
| 123 |
## License
|