Agent-first docs for unified multimodal RL

Build, run, and inspect unified multimodal RL experiments.

UniRL combines Ray actor groups, Hydra recipes, composable training stacks, and pluggable rollout engines for diffusion and autoregressive generative models.

Read English Docs 打开中文文档 Agent Guide

Launch UniRL Experiments

Hydra configs, Ray scheduling, and rollout engines

Ready to run

Task domainUnified multimodal generative RL

ModelsStable Diffusion 3 / 3.5, Qwen-Image, FLUX.2-Klein, WAN, HunyuanVideo, Qwen-VL, Qwen3, HunyuanImage3

AlgorithmsGRPO, DanceGRPO, MixGRPO, Flow-DPPO, NFT, DRPO

Entrypointpython -m unirl.train_diffusion

Quick start

python -m unirl.train_diffusion \

--config-name=diffusion/sd3_trainside \

num_devices=8

Distributed Training

Coordinate Ray actor groups for diffusion and multimodal RL workloads.

Hydra Recipes

Compose reproducible experiments from typed configs and focused overrides.

Pluggable Rollouts

Swap rollout engines, rewards, and policy logic without changing the entrypoint.

Documentation Entrypoints

English Docs->

Narrative docs for researchers and engineers.

中文文档->

中文安装、运行和开发路径。

Agent Index->

Human-readable map for how coding agents should use the docs.

Recommended Path

Start

Install dependencies, then launch a first single-node recipe.

Configure

Choose an experiment recipe and inspect the resolved Hydra config.

Scale

Adapt recipes for multinode runs and cluster-specific runtime paths.

Agent-Readable Endpoints

/llms.txt /llms-full.txt /md/agents/index.md /md/configuration/hydra/index.md