Extending UniRL
Where to add models, rollout engines, train-side algorithms, rewards, training backends, and recipes.
Use existing boundaries rather than adding cross-cutting glue. Most extensions need one implementation package, one config dataclass, one recipe, and one focused test or compose check.
Extension Map
| Goal | Primary location | How a recipe wires it |
|---|---|---|
| Add a model | unirl/models/<model_name>/ | @dataclass config referenced by _target_ under the recipe's model: block |
| Add a rollout engine | unirl/rollout/engine/<engine>/ | @dataclass config referenced by _target_ under rollout: |
| Add a train-side algorithm | unirl/algorithms/ | a BaseAlgorithmConfig subclass referenced by _target_, bound under a track's algorithm: node |
| Add a reward scorer | unirl/reward/local/ | a spec @dataclass referenced by _target_ under reward.backend.config |
| Add a training backend | unirl/train/backend/ | the Remote backend contract (alongside FSDPBackend) |
| Add a recipe | examples/<domain>/<recipe>.yaml | --config-name=<domain>/<recipe> |
Adding a Model
- Add
unirl/models/<model_name>/. - Define the model's config
@dataclassnext to it (referenced by_target_in the recipe). - Implement a bundle that exposes the trainable stages needed by training and rollout.
- Add condition, text/vision, diffusion, and VAE helpers as needed.
- Add at least one
examples/<domain>/<recipe>.yamlrecipe. - Document required external checkpoints through YAML env interpolation or launcher docs.
Adding a Rollout Engine
- Add a typed config under
unirl/rollout/engine/<engine>/. - Reference it by
_target_under the recipe'srollout:block. - Implement the engine contract from
unirl/rollout/engine/base.py. - Return canonical
RolloutRespdata (populatetracks[name]). - If the engine is dedicated, define trainer-to-rollout weight sync.
Adding a Train-Side Algorithm
The training loss is a per-track StageAlgorithm. Add a new class only when the loss math changes (recipe compositions like DanceGRPO / MixGRPO reuse DiffusionGRPO).
- Subclass
StageAlgorithm(unirl/algorithms/base.py). - Define a config subclassing
BaseAlgorithmConfig(a plain@dataclass) next to it. - Implement
compute_loss_and_backward(...)— replay the stage, compute the loss, callbackward(). - Set
supports_multi_updateto match whether the loss is valid across multiple optimizer steps on one rollout shard, andrequires_ema_rolloutonly for off-policy losses that need EMA sampling (NFT). - Bind the class under the track's
algorithm:node in a recipe.
See the generated Algorithms Package README for the full contract.
Adding a Reward Scorer
Add the spec and scorer near the reward implementation under unirl/reward/local/. Prefer LocalRewardBackend for in-process model scorers because it provides device resolution, eager load, offload(), and onload(). Define the spec as a plain @dataclass and reference it by _target_ under reward.backend.config in the recipe. For out-of-process scoring, point the reward backend at the remote service. See Rewards.
Training Backends and Parallelism
New training-backend or parallelism work (a new FSDP / SP / EP plan, or a VeOmni-style backend) targets the backend contract under unirl/train/backend/: implement the same Remote surface as FSDPBackend and map TrainTopology onto a parallel plan rather than hardcoding shard sizes. See Trainer & Training Stack and the generated Train Stack README.
Agent Tip
When an agent is asked to implement a new feature, first map the request to the table above, then read the closest package README before editing code.