|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
- zh |
|
|
- de |
|
|
- ru |
|
|
--- |
|
|
Reward model for Plan2Align(arxiv link: https://arxiv.org/abs/2502.20795), using for translation task on zh->en, zh->de, zh->ru language pairs. |
|
|
|
|
|
## Using Reward Model |
|
|
|
|
|
```python |
|
|
RM = AutoModelForCausalLMWithValueHead.from_pretrained('ray24724919/plan2align_rm',torch_dtype=torch_dtype) |
|
|
RM.eval() |
|
|
RM.gradient_checkpointing_enable() #if need |
|
|
|
|
|
value_head_weights = load_file("path-to-valuehead-safetensors") |
|
|
new_state_dict = {key.replace("v_head.", "") if key.startswith("v_head.") else key: value for key, value in value_head_weights.items()} |
|
|
RM.v_head.load_state_dict(new_state_dict) |
|
|
``` |
|
|
|
|
|
## System prompt of translation reward modeling |
|
|
|
|
|
```python |
|
|
messages = [{"role": "system", "content": "You are a helpful translator and only output the result."}, |
|
|
{"role": "user", "content": f"### Translate this from Chinese to {language}, Chinese:\n{source}\n### {language}:"}, |
|
|
{"role": "assistant", "content": translation}] |
|
|
``` |
|
|
|