Reward Backend Guide

This guide explains how to:

1) Runtime backend selection

Set backend via environment or Hydra override.

export EVOLVE_REWARD_BACKEND=vlac
# or
export EVOLVE_REWARD_BACKEND=robodopamine

Training scripts pass this value to:

+actor_rollout_ref.rollout.reward_backend=<backend>

python scripts/train_libero_10-sft_full-ttt.py \
  +actor_rollout_ref.rollout.reward_backend=vlac

Backends are integrated with a capability contract:

Current matrix:

Backend	progress	pairwise	done
`vlac`	yes	yes	optional
`robodopamine`	yes	no	no

robodopamine requires external Robo-Dopamine code (GRMInference). Set:

export ROBODOPAMINE_PATH=/path/to/Robo-Dopamine

or install Robo-Dopamine as an importable package in the active environment.

Fallback policy:

Create a backend class under verl/utils/reward_backends/ with:

capabilities = RewardBackendCapabilities(...)

def compute_trajectory_values(...):
    ...

def pairwise_critic(...):
    ...

compute_trajectory_values must return:

Edit verl/utils/reward_backend_factory.py:

python scripts/train_libero_10-sft_full-ttt.py \
  +actor_rollout_ref.rollout.reward_backend=<your_backend>

Verify:

vlac remains the reference backend for paper-faithful behavior.
custom backend integration should preserve algorithm invariants listed in ALGORITHM_INVARIANTS.md.