TTI / Release /docs /REWARD_BACKEND_GUIDE.md
JosephBai's picture
Upload folder using huggingface_hub
857c2e9 verified

Reward Backend Guide

This guide explains how to:

  • select reward backend at runtime,
  • understand backend capability differences,
  • integrate a custom reward backend.

1) Runtime backend selection

Set backend via environment or Hydra override.

Environment variable (recommended in scripts)

export EVOLVE_REWARD_BACKEND=vlac
# or
export EVOLVE_REWARD_BACKEND=robodopamine

Training scripts pass this value to:

+actor_rollout_ref.rollout.reward_backend=<backend>

Direct override

python scripts/train_libero_10-sft_full-ttt.py \
  +actor_rollout_ref.rollout.reward_backend=vlac

2) Capability model

Backends are integrated with a capability contract:

  • required: progress
  • optional: pairwise
  • optional: done

Current matrix:

Backend progress pairwise done
vlac yes yes optional
robodopamine yes no no

robodopamine requires external Robo-Dopamine code (GRMInference). Set:

export ROBODOPAMINE_PATH=/path/to/Robo-Dopamine

or install Robo-Dopamine as an importable package in the active environment.

Fallback policy:

  • if pairwise unsupported, pairwise reward branch is disabled.
  • termination remains derived from progress threshold.

3) Custom backend integration

Step 1: Implement adapter

Create a backend class under verl/utils/reward_backends/ with:

capabilities = RewardBackendCapabilities(...)

def compute_trajectory_values(...):
    ...

def pairwise_critic(...):
    ...

compute_trajectory_values must return:

  • value_list: progress values (0-100 scale expected by current rollout path),
  • critic_list: pairwise/incremental critic list (may be empty if unsupported).

Step 2: Register backend in factory

Edit verl/utils/reward_backend_factory.py:

  • add capability entry to _CAP_MAP,
  • add construction branch in build_reward_backend_from_config(...).

Step 3: Configure and run smoke check

python scripts/train_libero_10-sft_full-ttt.py \
  +actor_rollout_ref.rollout.reward_backend=<your_backend>

Verify:

  • rollout initializes,
  • progress reward is non-empty,
  • pairwise branch behavior matches declared capabilities.

4) Notes

  • vlac remains the reference backend for paper-faithful behavior.
  • custom backend integration should preserve algorithm invariants listed in ALGORITHM_INVARIANTS.md.