UniRL Documentation

Agent-first documentation for the UniRL distributed reinforcement learning framework.

UniRL is a distributed reinforcement learning framework for unified multimodal generative models. It trains diffusion and autoregressive models with Ray-based worker groups, Hydra experiment recipes, composable training stacks, and pluggable rollout engines.

This documentation site has two audiences:

Researchers and engineers who read the rendered Fumadocs pages.
Future coding agents that need stable Markdown entry points and task-oriented navigation.

Running Training

Each domain has its own entrypoint, all driven the same way:

python -m unirl.train_diffusion --config-name=<domain>/<recipe>   # diffusion image/video
python -m unirl.train_vlm       --config-name=<domain>/<recipe>   # autoregressive VLM / LLM
python -m unirl.train_pe        --config-name=<domain>/<recipe>   # prompt-enhancer (PE)
python -m unirl.train_unified_model       --config-name=<domain>/<recipe>   # HunyuanImage3 (mixed AR + diffusion)

<recipe> is a self-contained YAML filename (without .yaml) in the bucketed examples/ tree, addressed as <domain>/<recipe> — for example:

python -m unirl.train_diffusion --config-name=diffusion/sd3_trainside

Override any field inline with Hydra's key=value syntax, e.g. num_devices=8. Each recipe is the source of truth for model, algorithm, rollout engine, placement, reward, sync, and batch geometry.

Shell launchers in examples/ should stay thin — they prepare environment variables, start Ray, and pass the recipe name plus Hydra overrides, while the recipe semantics live in YAML. See examples/run_experiment_single_node.sh and examples/run_experiment_multinode_taiji.sh for the canonical pattern.

Detailed runtime and module contracts live in the package pages embedded in each docs section's sidebar, generated from the README files next to the code.

How Agents Should Read This Site

Agents should use Agent Index as the human-readable routing page before changing code. That page maps common tasks to the nearest rendered docs, package README contracts, and source directories.

Machine-readable endpoints such as /llms.txt, /llms-full.txt, and /md/<slug>/index.md are root-level access paths, not separate documentation categories. They are generated from the same MDX source as this site so rendered pages and agent context stay aligned.

Reading Paths

Start with Installation and First Run for setup.
Use the overview for the narrative runtime map.
Use Agent Index to choose the closest source files and README contracts for a task.

UniRL Documentation

Running Training

How Agents Should Read This Site

Reading Paths

On this page