Reward Backend Guide
This guide explains how to:
- select reward backend at runtime,
- understand backend capability differences,
- integrate a custom reward backend.
1) Runtime backend selection
Set backend via environment or Hydra override.
Environment variable (recommended in scripts)
export EVOLVE_REWARD_BACKEND=vlac
# or
export EVOLVE_REWARD_BACKEND=robodopamine
Training scripts pass this value to:
+actor_rollout_ref.rollout.reward_backend=<backend>
Direct override
python scripts/train_libero_10-sft_full-ttt.py \
+actor_rollout_ref.rollout.reward_backend=vlac
2) Capability model
Backends are integrated with a capability contract:
- required:
progress - optional:
pairwise - optional:
done
Current matrix:
| Backend | progress | pairwise | done |
|---|---|---|---|
vlac |
yes | yes | optional |
robodopamine |
yes | no | no |
robodopamine requires external Robo-Dopamine code (GRMInference). Set:
export ROBODOPAMINE_PATH=/path/to/Robo-Dopamine
or install Robo-Dopamine as an importable package in the active environment.
Fallback policy:
- if
pairwiseunsupported, pairwise reward branch is disabled. - termination remains derived from progress threshold.
3) Custom backend integration
Step 1: Implement adapter
Create a backend class under verl/utils/reward_backends/ with:
capabilities = RewardBackendCapabilities(...)
def compute_trajectory_values(...):
...
def pairwise_critic(...):
...
compute_trajectory_values must return:
value_list: progress values (0-100 scale expected by current rollout path),critic_list: pairwise/incremental critic list (may be empty if unsupported).
Step 2: Register backend in factory
Edit verl/utils/reward_backend_factory.py:
- add capability entry to
_CAP_MAP, - add construction branch in
build_reward_backend_from_config(...).
Step 3: Configure and run smoke check
python scripts/train_libero_10-sft_full-ttt.py \
+actor_rollout_ref.rollout.reward_backend=<your_backend>
Verify:
- rollout initializes,
- progress reward is non-empty,
- pairwise branch behavior matches declared capabilities.
4) Notes
vlacremains the reference backend for paper-faithful behavior.- custom backend integration should preserve algorithm invariants listed in
ALGORITHM_INVARIANTS.md.