ray24724919
/

plan2align_rm

Model card Files Files and versions

plan2align_rm / README.md

ray24724919's picture

Update README.md

209c920 verified 11 months ago

|

1.01 kB

	---
	license: apache-2.0
	language:
	- en
	- zh
	- de
	- ru
	---
	Reward model for Plan2Align(arxiv link: https://arxiv.org/abs/2502.20795), using for translation task on zh->en, zh->de, zh->ru language pairs.

	## Using Reward Model

	```python
	RM = AutoModelForCausalLMWithValueHead.from_pretrained('ray24724919/plan2align_rm',torch_dtype=torch_dtype)
	RM.eval()
	RM.gradient_checkpointing_enable() #if need

	value_head_weights = load_file("path-to-valuehead-safetensors")
	new_state_dict = {key.replace("v_head.", "") if key.startswith("v_head.") else key: value for key, value in value_head_weights.items()}
	RM.v_head.load_state_dict(new_state_dict)
	```

	## System prompt of translation reward modeling

	```python
	messages = [{"role": "system", "content": "You are a helpful translator and only output the result."},
	{"role": "user", "content": f"### Translate this from Chinese to {language}, Chinese:\n{source}\n### {language}:"},
	{"role": "assistant", "content": translation}]
	```