Configuration
Experiment Recipes
Recipes in the bucketed examples/ tree and how to select one per entrypoint.
Recipes are self-contained YAML files under examples/, bucketed by trainer domain (diffusion/, vlm/, llm/, pe/, unified_model/). Each one is the source of truth for model, algorithm, rollout engine, placement, reward, sync, and batch geometry. Select a recipe with --config-name=<domain>/<recipe> (no .yaml).
Entrypoint per Domain
The recipe family determines which entrypoint runs it:
| Entrypoint | Bucket | Recipe families |
|---|---|---|
python -m unirl.train_diffusion | diffusion/ | sd3_*, qwen_image_*, flux2_klein_*, wan21_*, wan22_*, hunyuan_video* |
python -m unirl.train_vlm | vlm/, llm/ | qwen_vl_argrpo_*, qwen3_ar_drpo_* |
python -m unirl.train_pe | pe/ | pe_* |
python -m unirl.train_unified_model | unified_model/ | hi3_* |
Maintained Families
| Family | Recipes |
|---|---|
| SD3 (GRPO) | sd3_trainside, sd3_trainside_tq_mooncake, sd3_dancegrpo, sd3_mixgrpo |
| SD3 NFT | sd3_nft, sd3_nft_reward_service, sd3_nft_sglang |
| SD3 Flow-DPPO | sd3_flowdppo, sd3_flowdppo_vllmomni |
| SD3 SGLang | sd3_sglang_native_colocate, sd3_sglang_replay_colocate, sd3_sglang_full_nccl_separate, sd3_sglang_full_tensor, sd3_sglang_lora_separate |
| SD3 vLLM-Omni | sd3_vllmomni, sd3_vllmomni_full_ipc, sd3_vllmomni_full_nccl_separate, sd3_vllmomni_full_tensor, sd3_vllmomni_lora_separate |
| Qwen-Image | qwen_image_trainside, qwen_image_dancegrpo, qwen_image_mixgrpo, qwen_image_nft |
| Flux.2-Klein | flux2_klein_trainside, flux2_klein_sglang |
| WAN 2.1 | wan21_t2v, wan21_t2v_dancegrpo, wan21_t2v_mixgrpo, wan21_i2v |
| WAN 2.2 | wan22_t2v_14b, wan22_t2v_14b_dancegrpo, wan22_t2v_14b_mixgrpo, wan22_i2v |
| HunyuanVideo | hunyuan_video_t2v_trainside, hunyuan_video15_t2v_dancegrpo_trainside, hunyuan_video15_t2v_vllmomni_nccl_separate |
| HunyuanImage3 | hi3_vllmomni |
| Qwen-VL ARGRPO (VLM) | qwen_vl_argrpo_geo3k_mc_4x8, qwen_vl_argrpo_geo3k_mc_4x8_lora, qwen_vl_argrpo_geo3k_mc_sglang_4x8, qwen_vl_argrpo_geo3k_mc_sglang_4x8_lora |
| Qwen3 DRPO (LLM) | qwen3_ar_drpo_4b_base_dpao_sglang |
| PE (prompt enhancer, AR + diffusion) | pe_trainside_pickscore, pe_sglang_full_pickscore, pe_sglang_full_wise, pe_sglang_lora_pickscore |
Selecting a Recipe
python -m unirl.train_diffusion --config-name=diffusion/sd3_trainsideLaunchers pass the same bucketed recipe name (and ENTRY selects a non-diffusion entrypoint):
bash examples/run_experiment_single_node.sh diffusion/sd3_trainside
ENTRY=train_vlm bash examples/run_experiment_single_node.sh vlm/qwen_vl_argrpo_geo3k_mc_4x8
bash examples/run_experiment_multinode_taiji.sh diffusion/sd3_sglang_native_colocateHow to Pick a Recipe
Use this decision order when a task does not name a specific recipe:
- Pick the modality and model family first: SD3 or Qwen-Image for image; WAN 2.1 / 2.2 for video; HunyuanImage3 for mixed AR + diffusion; Qwen-VL / Qwen3 for VLM / LLM; PE for prompt-enhancer.
- Pick the rollout topology:
trainsidefor direct sampling, SGLang or vLLM-Omni recipes for dedicated rollout, andcolocatewhen train and rollout share GPU bundles (vsseparate). - Pick the algorithm: GRPO / DanceGRPO / MixGRPO for on-policy ratio losses, Flow-DPPO for KL-masked policy optimization, NFT for off-policy forward-process training, DRPO for AR text.
- Pick the cluster-size variant, such as
4x8, only after matching the target hardware. - Run a compose check before launching Ray work.
Editing Guidance
When adding a recipe:
- Start from the closest existing
examples/<domain>/<recipe>.yaml. - Keep model, reward, rollout engine, backend, stack, sync, placement, and batch geometry in YAML, each instantiated by
_target_. - Use environment interpolation only for deployment-specific paths and logging identity.
- Run
python -m unirl.train_diffusion --config-name=<domain>/<recipe> --cfg job --resolve. - Add the recipe to this page.