# OpenRubrics/RubricRM-8B-Rubric-v2 This is a 8B RubricARM-Rubric model, finetuned from [Qwen3/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B). See our [paper](https://arxiv.org/abs/2602.01511) for more details. ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_id = "OpenRubrics/RubricARM-8B-Rubric" tok = AutoTokenizer.from_pretrained(model_id, use_fast=True) model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto") ``` To evaluate the model, please use the following format to build up message. ```python RUBRIC_PROMPT_TEMPLATE = ( "Your task is to extract a set of rubric-style instructions from a user's request.\n" "These rubrics will be used as evaluation criteria to check if a response fully meets the request.\n" "Every rubric item must be a universal principle. If any rubric still contains topic-specific references (e.g., names, places, myths, numbers, historical facts), it is automatically invalid.\n" "\n" "- **Two Distinct Categories:**\n" " - [Hard Rule]: Derived strictly from explicit requirements stated in the (format, length, structure, forbidden/required elements, etc.).\n" " - [Principle]: Derived by abstracting any concrete cues into domain-agnostic quality criteria (e.g., clarity, correctness, sound reasoning, pedagogy).\n" "\n" "- **Comprehensiveness:**\n" " The rubric must cover all critical aspects implied by the request and examples, including explicit requirements and implicit quality standards.\n" "\n" "- **Conciseness & Uniqueness:**\n" " Each rubric must capture a distinct evaluation criterion. Overlapping or redundant criteria must be merged into a single rubric. Wording must be precise and free of repetition.\n" "\n" "- **Format Requirements:**\n" " - Use a numbered list.\n" " - Each item starts with \"The response\" phrased in third person.\n" " - Append [Hard Rule] or [Principle] at the end of each item.\n" " - Do not include reasoning, explanations, or examples in the final output—only the rubrics.\n" "\n" "Here is the request:\n" "{prompt}\n" "\n" "Please generate the rubrics for the above request." ) user_text = RUBRIC_PROMPT_TEMPLATE.format( prompt=instruction, ) messages_list = [ {"role": "user", "content": user_text}, ] message = tok.apply_chat_template( messages_list, tokenize=False, add_generation_prompt=True, enable_thinking=False ) # Remaining step: Use either HF or vLLM for evaluation. # ... # ... ``` If you find our work helpful, please consider citing our paper: ``` @misc{xu2026alternating, title={Alternating Reinforcement Learning for Rubric-Based Reward Modeling in Non-Verifiable LLM Post-Training}, author={Ran Xu and Tianci Liu and Zihan Dong and Tony You and Ilgee Hong and Carl Yang and Linjun Zhang and Tao Zhao and Haoyu Wang}, year={2026}, eprint={2602.01511}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2602.01511}, } ```