UniRL
Guides

Extending UniRL

Where to add models, rollout engines, train-side algorithms, rewards, training backends, and recipes.

Use existing boundaries rather than adding cross-cutting glue. Most extensions need one implementation package, one config dataclass, one recipe, and one focused test or compose check.

Extension Map

GoalPrimary locationHow a recipe wires it
Add a modelunirl/models/<model_name>/@dataclass config referenced by _target_ under the recipe's model: block
Add a rollout engineunirl/rollout/engine/<engine>/@dataclass config referenced by _target_ under rollout:
Add a train-side algorithmunirl/algorithms/a BaseAlgorithmConfig subclass referenced by _target_, bound under a track's algorithm: node
Add a reward scorerunirl/reward/local/a spec @dataclass referenced by _target_ under reward.backend.config
Add a training backendunirl/train/backend/the Remote backend contract (alongside FSDPBackend)
Add a recipeexamples/<domain>/<recipe>.yaml--config-name=<domain>/<recipe>

Adding a Model

  1. Add unirl/models/<model_name>/.
  2. Define the model's config @dataclass next to it (referenced by _target_ in the recipe).
  3. Implement a bundle that exposes the trainable stages needed by training and rollout.
  4. Add condition, text/vision, diffusion, and VAE helpers as needed.
  5. Add at least one examples/<domain>/<recipe>.yaml recipe.
  6. Document required external checkpoints through YAML env interpolation or launcher docs.

Adding a Rollout Engine

  1. Add a typed config under unirl/rollout/engine/<engine>/.
  2. Reference it by _target_ under the recipe's rollout: block.
  3. Implement the engine contract from unirl/rollout/engine/base.py.
  4. Return canonical RolloutResp data (populate tracks[name]).
  5. If the engine is dedicated, define trainer-to-rollout weight sync.

Adding a Train-Side Algorithm

The training loss is a per-track StageAlgorithm. Add a new class only when the loss math changes (recipe compositions like DanceGRPO / MixGRPO reuse DiffusionGRPO).

  1. Subclass StageAlgorithm (unirl/algorithms/base.py).
  2. Define a config subclassing BaseAlgorithmConfig (a plain @dataclass) next to it.
  3. Implement compute_loss_and_backward(...) — replay the stage, compute the loss, call backward().
  4. Set supports_multi_update to match whether the loss is valid across multiple optimizer steps on one rollout shard, and requires_ema_rollout only for off-policy losses that need EMA sampling (NFT).
  5. Bind the class under the track's algorithm: node in a recipe.

See the generated Algorithms Package README for the full contract.

Adding a Reward Scorer

Add the spec and scorer near the reward implementation under unirl/reward/local/. Prefer LocalRewardBackend for in-process model scorers because it provides device resolution, eager load, offload(), and onload(). Define the spec as a plain @dataclass and reference it by _target_ under reward.backend.config in the recipe. For out-of-process scoring, point the reward backend at the remote service. See Rewards.

Training Backends and Parallelism

New training-backend or parallelism work (a new FSDP / SP / EP plan, or a VeOmni-style backend) targets the backend contract under unirl/train/backend/: implement the same Remote surface as FSDPBackend and map TrainTopology onto a parallel plan rather than hardcoding shard sizes. See Trainer & Training Stack and the generated Train Stack README.

Agent Tip

When an agent is asked to implement a new feature, first map the request to the table above, then read the closest package README before editing code.

On this page