Text Classification
Transformers
Safetensors
qwen2
text-generation
text-embeddings-inference

Add pipeline tag and GitHub link to model card

#2
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +6 -5
README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
- license: apache-2.0
3
- library_name: transformers
4
  datasets:
5
  - virtuoussy/Multi-subject-RLVR
6
  - sarosavo/Master-RM
@@ -18,8 +18,9 @@ language:
18
  - vie
19
  - tha
20
  - ara
21
- base_model:
22
- - Qwen/Qwen2.5-7B-Instruct
 
23
  ---
24
 
25
  # Robust Reward Model for LLM-as-a-Judge
@@ -28,6 +29,7 @@ This repository contains a robust, general-domain generative reward model presen
28
 
29
  - **Paper**: [One Token to Fool LLM-as-a-Judge](https://huggingface.co/papers/2507.08794)
30
  - **Training Data**: [https://huggingface.co/datasets/sarosavo/Master-RM](https://huggingface.co/datasets/sarosavo/Master-RM)
 
31
  - **Training algorithm**: Standard supervised fine-tuning, see Appendix A.2 for more details.
32
 
33
  ## Model Description
@@ -118,7 +120,6 @@ bash reward_server/RLVR_train.sh {METHOD} {PRETRAIN_PATH} {DATA_PATH} {REWARD_AP
118
  # REWARD_API: remote reward server url, e.g., http://127.0.0.1:8000/get_reward
119
  ```
120
 
121
-
122
  ## Citation
123
 
124
  If you use this model, please cite:
 
1
  ---
2
+ base_model:
3
+ - Qwen/Qwen2.5-7B-Instruct
4
  datasets:
5
  - virtuoussy/Multi-subject-RLVR
6
  - sarosavo/Master-RM
 
18
  - vie
19
  - tha
20
  - ara
21
+ library_name: transformers
22
+ license: apache-2.0
23
+ pipeline_tag: text-classification
24
  ---
25
 
26
  # Robust Reward Model for LLM-as-a-Judge
 
29
 
30
  - **Paper**: [One Token to Fool LLM-as-a-Judge](https://huggingface.co/papers/2507.08794)
31
  - **Training Data**: [https://huggingface.co/datasets/sarosavo/Master-RM](https://huggingface.co/datasets/sarosavo/Master-RM)
32
+ - **Code/GitHub Repository**: [https://github.com/Yulai-Zhao/Robust-Reward-Model](https://github.com/Yulai-Zhao/Robust-Reward-Model)
33
  - **Training algorithm**: Standard supervised fine-tuning, see Appendix A.2 for more details.
34
 
35
  ## Model Description
 
120
  # REWARD_API: remote reward server url, e.g., http://127.0.0.1:8000/get_reward
121
  ```
122
 
 
123
  ## Citation
124
 
125
  If you use this model, please cite: