UniRL
Getting Started

First Run

Compose and launch a UniRL experiment recipe.

Start by composing a recipe before launching Ray work. This catches missing paths, invalid Hydra overrides, and cross-component contract errors early.

python -m unirl.train_diffusion --config-name=diffusion/sd3_trainside --cfg job --resolve

Single Node

Use the generic single-node launcher when you want the scripts to prepare local Ray runtime defaults. The first argument is a bucketed recipe name (<domain>/<recipe>):

bash examples/run_experiment_single_node.sh diffusion/sd3_trainside

The diffusion entrypoint is the default; select another with ENTRY:

ENTRY=train_vlm bash examples/run_experiment_single_node.sh vlm/qwen_vl_argrpo_geo3k_mc_4x8
ENTRY=train_pe  bash examples/run_experiment_single_node.sh pe/pe_trainside_pickscore

For a dry run:

DRY_RUN=1 bash examples/run_experiment_single_node.sh diffusion/sd3_trainside

Multi Node

Use the role-aware launcher for multinode jobs:

bash examples/run_experiment_multinode_taiji.sh diffusion/sd3_sglang_native_colocate

Direct Hydra Invocation

You can invoke an entrypoint directly and override fields inline:

python -m unirl.train_diffusion \
  --config-name=diffusion/sd3_trainside \
  num_devices=8

Hydra override precedence is:

CLI Hydra override > launcher env var > YAML default

Sample Prompts

Committed prompt lists live under datasets/, for example datasets/pickscore/train.txt (one prompt per line) and datasets/pickscore/test.txt. Recipes point their data_source at these by default.

For real runs, point environment variables or CLI overrides to absolute data, model, and output paths:

DATA_PATH=/abs/path/train.json \
OUTPUT_DIR=/abs/path/outputs/run1 \
bash examples/run_experiment_single_node.sh diffusion/wan21_t2v

On this page