UniRL
Configuration

Experiment Recipes

Recipes in the bucketed examples/ tree and how to select one per entrypoint.

Recipes are self-contained YAML files under examples/, bucketed by trainer domain (diffusion/, vlm/, llm/, pe/, unified_model/). Each one is the source of truth for model, algorithm, rollout engine, placement, reward, sync, and batch geometry. Select a recipe with --config-name=<domain>/<recipe> (no .yaml).

Entrypoint per Domain

The recipe family determines which entrypoint runs it:

EntrypointBucketRecipe families
python -m unirl.train_diffusiondiffusion/sd3_*, qwen_image_*, flux2_klein_*, wan21_*, wan22_*, hunyuan_video*
python -m unirl.train_vlmvlm/, llm/qwen_vl_argrpo_*, qwen3_ar_drpo_*
python -m unirl.train_pepe/pe_*
python -m unirl.train_unified_modelunified_model/hi3_*

Maintained Families

FamilyRecipes
SD3 (GRPO)sd3_trainside, sd3_trainside_tq_mooncake, sd3_dancegrpo, sd3_mixgrpo
SD3 NFTsd3_nft, sd3_nft_reward_service, sd3_nft_sglang
SD3 Flow-DPPOsd3_flowdppo, sd3_flowdppo_vllmomni
SD3 SGLangsd3_sglang_native_colocate, sd3_sglang_replay_colocate, sd3_sglang_full_nccl_separate, sd3_sglang_full_tensor, sd3_sglang_lora_separate
SD3 vLLM-Omnisd3_vllmomni, sd3_vllmomni_full_ipc, sd3_vllmomni_full_nccl_separate, sd3_vllmomni_full_tensor, sd3_vllmomni_lora_separate
Qwen-Imageqwen_image_trainside, qwen_image_dancegrpo, qwen_image_mixgrpo, qwen_image_nft
Flux.2-Kleinflux2_klein_trainside, flux2_klein_sglang
WAN 2.1wan21_t2v, wan21_t2v_dancegrpo, wan21_t2v_mixgrpo, wan21_i2v
WAN 2.2wan22_t2v_14b, wan22_t2v_14b_dancegrpo, wan22_t2v_14b_mixgrpo, wan22_i2v
HunyuanVideohunyuan_video_t2v_trainside, hunyuan_video15_t2v_dancegrpo_trainside, hunyuan_video15_t2v_vllmomni_nccl_separate
HunyuanImage3hi3_vllmomni
Qwen-VL ARGRPO (VLM)qwen_vl_argrpo_geo3k_mc_4x8, qwen_vl_argrpo_geo3k_mc_4x8_lora, qwen_vl_argrpo_geo3k_mc_sglang_4x8, qwen_vl_argrpo_geo3k_mc_sglang_4x8_lora
Qwen3 DRPO (LLM)qwen3_ar_drpo_4b_base_dpao_sglang
PE (prompt enhancer, AR + diffusion)pe_trainside_pickscore, pe_sglang_full_pickscore, pe_sglang_full_wise, pe_sglang_lora_pickscore

Selecting a Recipe

python -m unirl.train_diffusion --config-name=diffusion/sd3_trainside

Launchers pass the same bucketed recipe name (and ENTRY selects a non-diffusion entrypoint):

bash examples/run_experiment_single_node.sh diffusion/sd3_trainside
ENTRY=train_vlm bash examples/run_experiment_single_node.sh vlm/qwen_vl_argrpo_geo3k_mc_4x8
bash examples/run_experiment_multinode_taiji.sh diffusion/sd3_sglang_native_colocate

How to Pick a Recipe

Use this decision order when a task does not name a specific recipe:

  1. Pick the modality and model family first: SD3 or Qwen-Image for image; WAN 2.1 / 2.2 for video; HunyuanImage3 for mixed AR + diffusion; Qwen-VL / Qwen3 for VLM / LLM; PE for prompt-enhancer.
  2. Pick the rollout topology: trainside for direct sampling, SGLang or vLLM-Omni recipes for dedicated rollout, and colocate when train and rollout share GPU bundles (vs separate).
  3. Pick the algorithm: GRPO / DanceGRPO / MixGRPO for on-policy ratio losses, Flow-DPPO for KL-masked policy optimization, NFT for off-policy forward-process training, DRPO for AR text.
  4. Pick the cluster-size variant, such as 4x8, only after matching the target hardware.
  5. Run a compose check before launching Ray work.

Editing Guidance

When adding a recipe:

  1. Start from the closest existing examples/<domain>/<recipe>.yaml.
  2. Keep model, reward, rollout engine, backend, stack, sync, placement, and batch geometry in YAML, each instantiated by _target_.
  3. Use environment interpolation only for deployment-specific paths and logging identity.
  4. Run python -m unirl.train_diffusion --config-name=<domain>/<recipe> --cfg job --resolve.
  5. Add the recipe to this page.

On this page