UniRL
Guides

Rewards

Reward service, local and remote backends, and extension points.

unirl.reward constructs and runs reward backends. Rollout engines generate media; a reward backend scores it and returns per-sample values that the trainer turns into advantages.

Structure

A reward is exactly one backend, held by RewardService (unirl/reward/service.py), which scores a RolloutTrack via score_and_attach:

  • a local in-process scorer (unirl/reward/local/: PickScore, HPS, OCR, GenEval2, VideoPickScore, …), or
  • the remote HTTP client (RemoteRewardBackend) talking to the standalone unirl-reward-service/ server.

The current YAML shape, component contract, and scorer extension workflow live in the generated Reward Package README.

Config Shape

A reward is wired via Hydra _target_:

reward:
  _target_: unirl.reward.service.RewardService
  backend:
    _target_: unirl.reward.local.pickscore.PickScoreRewardScorer
    base_device: cuda
    config:
      _target_: unirl.reward.local.pickscore.PickScoreSpec
      batch_size: 8

For out-of-process scoring, point backend._target_ at unirl.reward.remote.RemoteRewardBackend with a RemoteRewardSpec (base_url, required_rewards, reward_weights, input_kind). The remote service runs from unirl-reward-service/ on its own GPU node.

Failure Semantics

Reward failures are loud, never silent: a non-finite or null reward is flagged as a sample failure, and RewardService.score_and_attach raises on any failure (naming the offending reward and sample) so an inference error stops the step rather than poisoning the GRPO group.

Adding a Local Scorer

Add the spec and scorer near the reward implementation under unirl/reward/local/. Prefer LocalRewardBackend for in-process model scorers because it provides device resolution, eager load, offload(), and onload(). Define the spec as a plain @dataclass and reference it by _target_ under reward.backend.config in the recipe (no registration step). See the generated Reward Package README for the full example and the remote wire contract.

On this page