File size: 2,419 Bytes

857c2e9

# Reward Backend Guide

This guide explains how to:

- select reward backend at runtime,
- understand backend capability differences,
- integrate a custom reward backend.

## 1) Runtime backend selection

Set backend via environment or Hydra override.

### Environment variable (recommended in scripts)

```bash
export EVOLVE_REWARD_BACKEND=vlac
# or
export EVOLVE_REWARD_BACKEND=robodopamine
```

Training scripts pass this value to:

```text
+actor_rollout_ref.rollout.reward_backend=<backend>
```

### Direct override

```bash
python scripts/train_libero_10-sft_full-ttt.py \
  +actor_rollout_ref.rollout.reward_backend=vlac
```

## 2) Capability model

Backends are integrated with a capability contract:

- required: `progress`
- optional: `pairwise`
- optional: `done`

Current matrix:

| Backend | progress | pairwise | done |
|---|---|---|---|
| `vlac` | yes | yes | optional |
| `robodopamine` | yes | no | no |

`robodopamine` requires external Robo-Dopamine code (`GRMInference`). Set:

```bash
export ROBODOPAMINE_PATH=/path/to/Robo-Dopamine
```

or install Robo-Dopamine as an importable package in the active environment.

Fallback policy:

- if `pairwise` unsupported, pairwise reward branch is disabled.
- termination remains derived from progress threshold.

## 3) Custom backend integration

### Step 1: Implement adapter

Create a backend class under `verl/utils/reward_backends/` with:

```python
capabilities = RewardBackendCapabilities(...)

def compute_trajectory_values(...):
    ...

def pairwise_critic(...):
    ...
```

`compute_trajectory_values` must return:

- `value_list`: progress values (0-100 scale expected by current rollout path),
- `critic_list`: pairwise/incremental critic list (may be empty if unsupported).

### Step 2: Register backend in factory

Edit `verl/utils/reward_backend_factory.py`:

- add capability entry to `_CAP_MAP`,
- add construction branch in `build_reward_backend_from_config(...)`.

### Step 3: Configure and run smoke check

```bash
python scripts/train_libero_10-sft_full-ttt.py \
  +actor_rollout_ref.rollout.reward_backend=<your_backend>
```

Verify:

- rollout initializes,
- progress reward is non-empty,
- pairwise branch behavior matches declared capabilities.

## 4) Notes

- `vlac` remains the reference backend for paper-faithful behavior.
- custom backend integration should preserve algorithm invariants listed in `ALGORITHM_INVARIANTS.md`.