Experiment Recipes

Recipes are self-contained YAML files under examples/, bucketed by trainer domain (diffusion/, vlm/, llm/, pe/, unified_model/). Each one is the source of truth for model, algorithm, rollout engine, placement, reward, sync, and batch geometry. Select a recipe with --config-name=<domain>/<recipe> (no .yaml).

Entrypoint per Domain

The recipe family determines which entrypoint runs it:

Entrypoint	Bucket	Recipe families
`python -m unirl.train_diffusion`	`diffusion/`	`sd3_`, `qwen_image_`, `flux2_klein_`, `wan21_`, `wan22_`, `hunyuan_video`
`python -m unirl.train_vlm`	`vlm/`, `llm/`	`qwen_vl_argrpo_`, `qwen3_ar_drpo_`
`python -m unirl.train_pe`	`pe/`	`pe_*`
`python -m unirl.train_unified_model`	`unified_model/`	`hi3_*`

Maintained Families

Family	Recipes
SD3 (GRPO)	`sd3_trainside`, `sd3_trainside_tq_mooncake`, `sd3_dancegrpo`, `sd3_mixgrpo`
SD3 NFT	`sd3_nft`, `sd3_nft_reward_service`, `sd3_nft_sglang`
SD3 Flow-DPPO	`sd3_flowdppo`, `sd3_flowdppo_vllmomni`
SD3 SGLang	`sd3_sglang_native_colocate`, `sd3_sglang_replay_colocate`, `sd3_sglang_full_nccl_separate`, `sd3_sglang_full_tensor`, `sd3_sglang_lora_separate`
SD3 vLLM-Omni	`sd3_vllmomni`, `sd3_vllmomni_full_ipc`, `sd3_vllmomni_full_nccl_separate`, `sd3_vllmomni_full_tensor`, `sd3_vllmomni_lora_separate`
Qwen-Image	`qwen_image_trainside`, `qwen_image_dancegrpo`, `qwen_image_mixgrpo`, `qwen_image_nft`
Flux.2-Klein	`flux2_klein_trainside`, `flux2_klein_sglang`
WAN 2.1	`wan21_t2v`, `wan21_t2v_dancegrpo`, `wan21_t2v_mixgrpo`, `wan21_i2v`
WAN 2.2	`wan22_t2v_14b`, `wan22_t2v_14b_dancegrpo`, `wan22_t2v_14b_mixgrpo`, `wan22_i2v`
HunyuanVideo	`hunyuan_video_t2v_trainside`, `hunyuan_video15_t2v_dancegrpo_trainside`, `hunyuan_video15_t2v_vllmomni_nccl_separate`
HunyuanImage3	`hi3_vllmomni`
Qwen-VL ARGRPO (VLM)	`qwen_vl_argrpo_geo3k_mc_4x8`, `qwen_vl_argrpo_geo3k_mc_4x8_lora`, `qwen_vl_argrpo_geo3k_mc_sglang_4x8`, `qwen_vl_argrpo_geo3k_mc_sglang_4x8_lora`
Qwen3 DRPO (LLM)	`qwen3_ar_drpo_4b_base_dpao_sglang`
PE (prompt enhancer, AR + diffusion)	`pe_trainside_pickscore`, `pe_sglang_full_pickscore`, `pe_sglang_full_wise`, `pe_sglang_lora_pickscore`

Selecting a Recipe

python -m unirl.train_diffusion --config-name=diffusion/sd3_trainside

Launchers pass the same bucketed recipe name (and ENTRY selects a non-diffusion entrypoint):

bash examples/run_experiment_single_node.sh diffusion/sd3_trainside
ENTRY=train_vlm bash examples/run_experiment_single_node.sh vlm/qwen_vl_argrpo_geo3k_mc_4x8
bash examples/run_experiment_multinode_taiji.sh diffusion/sd3_sglang_native_colocate

How to Pick a Recipe

Use this decision order when a task does not name a specific recipe:

Pick the modality and model family first: SD3 or Qwen-Image for image; WAN 2.1 / 2.2 for video; HunyuanImage3 for mixed AR + diffusion; Qwen-VL / Qwen3 for VLM / LLM; PE for prompt-enhancer.
Pick the rollout topology: trainside for direct sampling, SGLang or vLLM-Omni recipes for dedicated rollout, and colocate when train and rollout share GPU bundles (vs separate).
Pick the algorithm: GRPO / DanceGRPO / MixGRPO for on-policy ratio losses, Flow-DPPO for KL-masked policy optimization, NFT for off-policy forward-process training, DRPO for AR text.
Pick the cluster-size variant, such as 4x8, only after matching the target hardware.
Run a compose check before launching Ray work.

Editing Guidance

When adding a recipe:

Start from the closest existing examples/<domain>/<recipe>.yaml.
Keep model, reward, rollout engine, backend, stack, sync, placement, and batch geometry in YAML, each instantiated by _target_.
Use environment interpolation only for deployment-specific paths and logging identity.
Run python -m unirl.train_diffusion --config-name=<domain>/<recipe> --cfg job --resolve.
Add the recipe to this page.