Extending UniRL

Where to add models, rollout engines, train-side algorithms, rewards, training backends, and recipes.

Use existing boundaries rather than adding cross-cutting glue. Most extensions need one implementation package, one config dataclass, one recipe, and one focused test or compose check.

Extension Map

Goal	Primary location	How a recipe wires it
Add a model	`unirl/models/<model_name>/`	`@dataclass` config referenced by `_target_` under the recipe's `model:` block
Add a rollout engine	`unirl/rollout/engine/<engine>/`	`@dataclass` config referenced by `_target_` under `rollout:`
Add a train-side algorithm	`unirl/algorithms/`	a `BaseAlgorithmConfig` subclass referenced by `_target_`, bound under a track's `algorithm:` node
Add a reward scorer	`unirl/reward/local/`	a spec `@dataclass` referenced by `_target_` under `reward.backend.config`
Add a training backend	`unirl/train/backend/`	the `Remote` backend contract (alongside `FSDPBackend`)
Add a recipe	`examples/<domain>/<recipe>.yaml`	`--config-name=<domain>/<recipe>`

Adding a Model

Add unirl/models/<model_name>/.
Define the model's config @dataclass next to it (referenced by _target_ in the recipe).
Implement a bundle that exposes the trainable stages needed by training and rollout.
Add condition, text/vision, diffusion, and VAE helpers as needed.
Add at least one examples/<domain>/<recipe>.yaml recipe.
Document required external checkpoints through YAML env interpolation or launcher docs.

Adding a Rollout Engine

Add a typed config under unirl/rollout/engine/<engine>/.
Reference it by _target_ under the recipe's rollout: block.
Implement the engine contract from unirl/rollout/engine/base.py.
Return canonical RolloutResp data (populate tracks[name]).
If the engine is dedicated, define trainer-to-rollout weight sync.

Adding a Train-Side Algorithm

The training loss is a per-track StageAlgorithm. Add a new class only when the loss math changes (recipe compositions like DanceGRPO / MixGRPO reuse DiffusionGRPO).

Subclass StageAlgorithm (unirl/algorithms/base.py).
Define a config subclassing BaseAlgorithmConfig (a plain @dataclass) next to it.
Implement compute_loss_and_backward(...) — replay the stage, compute the loss, call backward().
Set supports_multi_update to match whether the loss is valid across multiple optimizer steps on one rollout shard, and requires_ema_rollout only for off-policy losses that need EMA sampling (NFT).
Bind the class under the track's algorithm: node in a recipe.

See the generated Algorithms Package README for the full contract.

Add the spec and scorer near the reward implementation under unirl/reward/local/. Prefer LocalRewardBackend for in-process model scorers because it provides device resolution, eager load, offload(), and onload(). Define the spec as a plain @dataclass and reference it by _target_ under reward.backend.config in the recipe. For out-of-process scoring, point the reward backend at the remote service. See Rewards.

Training Backends and Parallelism

New training-backend or parallelism work (a new FSDP / SP / EP plan, or a VeOmni-style backend) targets the backend contract under unirl/train/backend/: implement the same Remote surface as FSDPBackend and map TrainTopology onto a parallel plan rather than hardcoding shard sizes. See Trainer & Training Stack and the generated Train Stack README.

Agent Tip

When an agent is asked to implement a new feature, first map the request to the table above, then read the closest package README before editing code.