File size: 2,419 Bytes
857c2e9 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 | # Reward Backend Guide
This guide explains how to:
- select reward backend at runtime,
- understand backend capability differences,
- integrate a custom reward backend.
## 1) Runtime backend selection
Set backend via environment or Hydra override.
### Environment variable (recommended in scripts)
```bash
export EVOLVE_REWARD_BACKEND=vlac
# or
export EVOLVE_REWARD_BACKEND=robodopamine
```
Training scripts pass this value to:
```text
+actor_rollout_ref.rollout.reward_backend=<backend>
```
### Direct override
```bash
python scripts/train_libero_10-sft_full-ttt.py \
+actor_rollout_ref.rollout.reward_backend=vlac
```
## 2) Capability model
Backends are integrated with a capability contract:
- required: `progress`
- optional: `pairwise`
- optional: `done`
Current matrix:
| Backend | progress | pairwise | done |
|---|---|---|---|
| `vlac` | yes | yes | optional |
| `robodopamine` | yes | no | no |
`robodopamine` requires external Robo-Dopamine code (`GRMInference`). Set:
```bash
export ROBODOPAMINE_PATH=/path/to/Robo-Dopamine
```
or install Robo-Dopamine as an importable package in the active environment.
Fallback policy:
- if `pairwise` unsupported, pairwise reward branch is disabled.
- termination remains derived from progress threshold.
## 3) Custom backend integration
### Step 1: Implement adapter
Create a backend class under `verl/utils/reward_backends/` with:
```python
capabilities = RewardBackendCapabilities(...)
def compute_trajectory_values(...):
...
def pairwise_critic(...):
...
```
`compute_trajectory_values` must return:
- `value_list`: progress values (0-100 scale expected by current rollout path),
- `critic_list`: pairwise/incremental critic list (may be empty if unsupported).
### Step 2: Register backend in factory
Edit `verl/utils/reward_backend_factory.py`:
- add capability entry to `_CAP_MAP`,
- add construction branch in `build_reward_backend_from_config(...)`.
### Step 3: Configure and run smoke check
```bash
python scripts/train_libero_10-sft_full-ttt.py \
+actor_rollout_ref.rollout.reward_backend=<your_backend>
```
Verify:
- rollout initializes,
- progress reward is non-empty,
- pairwise branch behavior matches declared capabilities.
## 4) Notes
- `vlac` remains the reference backend for paper-faithful behavior.
- custom backend integration should preserve algorithm invariants listed in `ALGORITHM_INVARIANTS.md`.
|