| # OpenRubrics/RubricRM-8B-Rubric-v2 | |
| This is a 8B RubricARM-Rubric model, finetuned from [Qwen3/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B). | |
| See our [paper](https://arxiv.org/abs/2602.01511) for more details. | |
| ## Usage | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| model_id = "OpenRubrics/RubricARM-8B-Rubric" | |
| tok = AutoTokenizer.from_pretrained(model_id, use_fast=True) | |
| model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto") | |
| ``` | |
| To evaluate the model, please use the following format to build up message. | |
| ```python | |
| RUBRIC_PROMPT_TEMPLATE = ( | |
| "Your task is to extract a set of rubric-style instructions from a user's request.\n" | |
| "These rubrics will be used as evaluation criteria to check if a response fully meets the request.\n" | |
| "Every rubric item must be a universal principle. If any rubric still contains topic-specific references (e.g., names, places, myths, numbers, historical facts), it is automatically invalid.\n" | |
| "\n" | |
| "- **Two Distinct Categories:**\n" | |
| " - [Hard Rule]: Derived strictly from explicit requirements stated in the <request> (format, length, structure, forbidden/required elements, etc.).\n" | |
| " - [Principle]: Derived by abstracting any concrete cues into domain-agnostic quality criteria (e.g., clarity, correctness, sound reasoning, pedagogy).\n" | |
| "\n" | |
| "- **Comprehensiveness:**\n" | |
| " The rubric must cover all critical aspects implied by the request and examples, including explicit requirements and implicit quality standards.\n" | |
| "\n" | |
| "- **Conciseness & Uniqueness:**\n" | |
| " Each rubric must capture a distinct evaluation criterion. Overlapping or redundant criteria must be merged into a single rubric. Wording must be precise and free of repetition.\n" | |
| "\n" | |
| "- **Format Requirements:**\n" | |
| " - Use a numbered list.\n" | |
| " - Each item starts with \"The response\" phrased in third person.\n" | |
| " - Append [Hard Rule] or [Principle] at the end of each item.\n" | |
| " - Do not include reasoning, explanations, or examples in the final output—only the rubrics.\n" | |
| "\n" | |
| "Here is the request:\n" | |
| "{prompt}\n" | |
| "\n" | |
| "Please generate the rubrics for the above request." | |
| ) | |
| user_text = RUBRIC_PROMPT_TEMPLATE.format( | |
| prompt=instruction, | |
| ) | |
| messages_list = [ | |
| {"role": "user", "content": user_text}, | |
| ] | |
| message = tok.apply_chat_template( | |
| messages_list, | |
| tokenize=False, | |
| add_generation_prompt=True, | |
| enable_thinking=False | |
| ) | |
| # Remaining step: Use either HF or vLLM for evaluation. | |
| # ... | |
| # ... | |
| ``` | |
| If you find our work helpful, please consider citing our paper: | |
| ``` | |
| @misc{xu2026alternating, | |
| title={Alternating Reinforcement Learning for Rubric-Based Reward Modeling in Non-Verifiable LLM Post-Training}, | |
| author={Ran Xu and Tianci Liu and Zihan Dong and Tony You and Ilgee Hong and Carl Yang and Linjun Zhang and Tao Zhao and Haoyu Wang}, | |
| year={2026}, | |
| eprint={2602.01511}, | |
| archivePrefix={arXiv}, | |
| primaryClass={cs.CL}, | |
| url={https://arxiv.org/abs/2602.01511}, | |
| } | |
| ``` |