Agent-first docs for unified multimodal RL

Build, run, and inspect unified multimodal RL experiments.

UniRL combines Ray actor groups, Hydra recipes, composable training stacks, and pluggable rollout engines for diffusion and autoregressive generative models.

Launch UniRL Experiments

Hydra configs, Ray scheduling, and rollout engines

Ready to run
Task domainUnified multimodal generative RL
ModelsStable Diffusion 3 / 3.5, Qwen-Image, FLUX.2-Klein, WAN, HunyuanVideo, Qwen-VL, Qwen3, HunyuanImage3
AlgorithmsGRPO, DanceGRPO, MixGRPO, Flow-DPPO, NFT, DRPO
Entrypointpython -m unirl.train_diffusion

Quick start

python -m unirl.train_diffusion \

  --config-name=diffusion/sd3_trainside \

  num_devices=8

Distributed Training

Coordinate Ray actor groups for diffusion and multimodal RL workloads.

Hydra Recipes

Compose reproducible experiments from typed configs and focused overrides.

Pluggable Rollouts

Swap rollout engines, rewards, and policy logic without changing the entrypoint.

Recommended Path

01

Start

Install dependencies, then launch a first single-node recipe.

02

Configure

Choose an experiment recipe and inspect the resolved Hydra config.

03

Scale

Adapt recipes for multinode runs and cluster-specific runtime paths.