Agent Task Recipes
Common coding-agent tasks mapped to files, checks, and likely risks.
Use this page as a routing table before editing the repository.
Add a New Training Recipe
Read:
unirl/config/README.md- closest
examples/<domain>/<recipe>.yaml
Edit:
examples/<domain>/<new_recipe>.yaml- this docs page and
/en/docs/configuration/experimentsif the recipe is maintained
Check:
python -m unirl.train_diffusion --config-name=<domain>/<new_recipe> --cfg job --resolveRisk: mismatched placement, rollout batch size, and train-stack batch geometry.
See also: unirl/train/stack.py, unirl/config/.
Add a New Model
Read:
unirl/models/README.md- closest model package under
unirl/models/
Edit:
unirl/models/<model_name>/examples/<domain>/<recipe>.yaml
Check:
- config registration imports
- LoRA target materialization
- prompt/condition contracts expected by rollout engines
Risk: leaking model-specific assumptions into generic rollout or training packages.
See also: unirl/types/ and the closest existing model package.
Add a Rollout Engine
Read:
unirl/rollout/README.mdunirl/distributed/weight_sync/README.md- existing
engine/trainside,engine/sglang, orengine/vllm_omni
Edit:
unirl/rollout/engine/<engine>/- optional weight sync backend if the engine is dedicated
- at least one recipe under
examples/<domain>/
Check:
- all backend outputs adapt to canonical
RolloutResp(tracks[name]) - direct-vs-dedicated sync contracts pass validation
Risk: returning backend-specific objects across the rollout/training boundary.
See also: unirl/types/rollout_req.py, unirl/types/rollout_resp.py.
Add or Debug Weight Sync
Read:
unirl/distributed/weight_sync/README.mdunirl/rollout/README.md- the recipe's
syncselection inexamples/<domain>/<recipe>.yaml
Edit:
unirl/distributed/weight_sync/examples/<domain>/<recipe>.yamlwhen selecting or tuning a backend
Check:
- dedicated rollout engines configure exactly one supported
syncbackend; - direct sampling recipes omit
sync; - CUDA-IPC sync is only used with colocated dedicated rollout.
Risk: confusing trainer-to-rollout weight sync with the rollout-output tensor transport.
See also: unirl/distributed/tensor/, unirl/config/.
Add or Debug SDE Logic
Read:
unirl/sde/README.mdunirl/algorithms/README.md- the selected recipe's
samplingandsampling/sde_strategysections
Edit:
unirl/sde/- model-specific sigma overrides under
unirl/models/<model_name>/ - recipe YAML when selecting strategy or step schedules
Check:
- trained strategies provide log-probability paths required by GRPO-style losses;
- evaluation-only solvers are not used for train-side ratio objectives;
- old log-probs remain fixed across
stack.num_updates_per_batch.
Risk: changing sigma schedules or log-prob behavior without updating rollout and replay assumptions.
See also: unirl/types/sampling.py, unirl/types/rollout_resp.py, unirl/algorithms/diffusion_grpo.py.
Add a Reward
Read:
/en/docs/guides/rewardsunirl/reward/README.md- closest scorer under
unirl/reward/local/
Edit:
- scorer implementation and spec config
- recipe reward backend
Check:
- scorer batch size and device behavior
- offload/onload if the scorer holds a model
Risk: introducing slow in-process scoring without batch controls.
See also: unirl/reward/local/base.py, unirl/types/reward.py.
Debug a Failed Run
Start with:
python -m unirl.train_diffusion --config-name=<domain>/<recipe> --cfg job --resolveThen inspect:
- config validation errors first;
- the launchers in
examples/for env handling; unirl/rollout/README.mdfor engine mode and sync requirements;unirl/train/readme.mdfor the train-step contract and batch geometry.
Risk: treating a Ray runtime error as the root cause when Hydra composition already encoded an invalid topology.
See also: unirl/config/.