GenRe2 GRM SFT Checkpoint

This repository contains the supervised fine-tuned generative reward model used as a local GRM checkpoint candidate for the GenRe2 experiments.

Source

Local source path used for this release:

/home/trx/rlm-code/ICLR26/output/mistralai/Mistral-7B-Instruct-v0.2-sft_newdataset_fuxian_final

Expected uploaded files:

  • config.json
  • generation_config.json
  • model-00001-of-00003.safetensors
  • model-00002-of-00003.safetensors
  • model-00003-of-00003.safetensors
  • model.safetensors.index.json
  • tokenizer.json
  • tokenizer.model
  • tokenizer_config.json
  • special_tokens_map.json
  • chat_template.jinja

Result Status

The closest local result row found during release preparation is:

Local result FB Bench FLASK MT Bench Vicuna Avg Paper row
Mistral-7B-Instruct-v0.2-sft_epoch2 0.7822 0.3726 0.2908 0.3429 0.4471 CE avg 0.4494

This is close to the paper CE average, but it is not a complete five-seed reproduction artifact.

Notes

  • Released under Apache-2.0. The base model metadata for mistralai/Mistral-7B-Instruct-v0.2 also lists Apache-2.0.
  • This checkpoint is large, about 14 GB locally.
  • The code release and checkpoint audit notes are in /home/trx/rlm-code/genre2_main_experiments.
Downloads last month
-
Safetensors
Model size
7B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for simonts/genre2-grm-sft

Finetuned
(1096)
this model
Adapters
1 model