mshahidul

Initial commit of readCtrl code without large models

030876e 8 days ago

6.44 kB

	Welcome to verl's documentation!
	================================================

	verl is a flexible, efficient and production-ready RL training framework designed for large language models (LLMs) post-training. It is an open source implementation of the `HybridFlow <https://arxiv.org/pdf/2409.19256>`_ paper.

	verl is flexible and easy to use with:

	- Easy extension of diverse RL algorithms: The hybrid programming model combines the strengths of single-controller and multi-controller paradigms to enable flexible representation and efficient execution of complex Post-Training dataflows. Allowing users to build RL dataflows in a few lines of code.

	- Seamless integration of existing LLM infra with modular APIs: Decouples computation and data dependencies, enabling seamless integration with existing LLM frameworks, such as PyTorch FSDP, Megatron-LM, vLLM and SGLang. Moreover, users can easily extend to other LLM training and inference frameworks.

	- Flexible device mapping and parallelism: Supports various placement of models onto different sets of GPUs for efficient resource utilization and scalability across different cluster sizes.

	- Ready integration with popular HuggingFace models


	verl is fast with:

	- State-of-the-art throughput: By seamlessly integrating existing SOTA LLM training and inference frameworks, verl achieves high generation and training throughput.

	- Efficient actor model resharding with 3D-HybridEngine: Eliminates memory redundancy and significantly reduces communication overhead during transitions between training and generation phases.

	--------------------------------------------

	.. _Contents:

	.. toctree::
	:maxdepth: 2
	:caption: Quickstart

	start/install
	start/quickstart
	start/multinode
	start/ray_debug_tutorial
	start/more_resources
	start/agentic_rl

	.. toctree::
	:maxdepth: 2
	:caption: Programming guide

	hybrid_flow
	single_controller

	.. toctree::
	:maxdepth: 1
	:caption: Data Preparation

	preparation/prepare_data
	preparation/reward_function

	.. toctree::
	:maxdepth: 2
	:caption: Configurations

	examples/config

	.. toctree::
	:maxdepth: 1
	:caption: PPO Example

	examples/ppo_code_architecture
	examples/gsm8k_example
	examples/multi_modal_example
	examples/skypilot_examples

	.. toctree::
	:maxdepth: 1
	:caption: Algorithms

	algo/ppo.md
	algo/grpo.md
	algo/collabllm.md
	algo/dapo.md
	algo/spin.md
	algo/sppo.md
	algo/entropy.md
	algo/opo.md
	algo/baseline.md
	algo/gpg.md
	algo/rollout_corr.md
	algo/rollout_corr_math.md
	algo/otb.md

	.. toctree::
	:maxdepth: 1
	:caption: PPO Trainer and Workers

	workers/ray_trainer
	workers/fsdp_workers
	workers/megatron_workers
	workers/sglang_worker
	workers/trtllm_worker
	workers/model_engine

	.. toctree::
	:maxdepth: 1
	:caption: Performance Tuning Guide

	perf/dpsk.md
	perf/best_practices
	perf/perf_tuning
	README_vllm0.8.md
	perf/device_tuning
	perf/verl_profiler_system.md
	perf/nsight_profiling.md
	perf/torch_profiling.md

	.. toctree::
	:maxdepth: 1
	:caption: Adding new models

	advance/fsdp_extension
	advance/megatron_extension

	.. toctree::
	:maxdepth: 1
	:caption: Advanced Features

	advance/checkpoint
	advance/rope
	advance/attention_implementation
	advance/ppo_lora.rst
	sglang_multiturn/multiturn.rst
	sglang_multiturn/interaction_system.rst
	advance/placement
	advance/dpo_extension
	examples/sandbox_fusion_example
	advance/rollout_trace.rst
	advance/rollout_skip.rst
	advance/one_step_off
	advance/agent_loop
	advance/reward_loop
	advance/fully_async
	data/transfer_queue.md
	advance/grafana_prometheus.md
	advance/fp8.md
	advance/async-on-policy-distill
	advance/mtp.md

	.. toctree::
	:maxdepth: 1
	:caption: Hardware Support

	amd_tutorial/amd_build_dockerfile_page.rst
	amd_tutorial/amd_vllm_page.rst
	ascend_tutorial/ascend_quick_start.rst
	ascend_tutorial/ascend_consistency.rst
	ascend_tutorial/ascend_profiling_zh.rst
	ascend_tutorial/ascend_profiling_en.rst
	ascend_tutorial/dockerfile_build_guidance.rst
	ascend_tutorial/ascend_sglang_quick_start.rst
	ascend_tutorial/examples/gspo_optimization_practice.md
	ascend_tutorial/examples/dapo_multi_model_optimization_practice.md
	ascend_tutorial/examples/ascend_sglang_best_practices.rst

	.. toctree::
	:maxdepth: 1
	:caption: API References

	api/data
	api/single_controller.rst
	api/trainer.rst
	api/utils.rst

	.. toctree::
	:maxdepth: 1
	:caption: Blog

	blog/v0.7.md

	.. toctree::
	:maxdepth: 2
	:caption: FAQ

	faq/faq

	.. toctree::
	:maxdepth: 1
	:caption: Development Notes

	sglang_multiturn/sandbox_fusion.rst

	Contribution
	-------------

	verl is free software; you can redistribute it and/or modify it under the terms
	of the Apache License 2.0. We welcome contributions.
	Join us on `GitHub <https://github.com/volcengine/verl>`_, `Slack <https://join.slack.com/t/verlgroup/shared_invite/zt-2w5p9o4c3-yy0x2Q56s_VlGLsJ93A6vA>`_ and `Wechat <https://raw.githubusercontent.com/eric-haibin-lin/verl-community/refs/heads/main/WeChat.JPG>`_ for discussions.

	Contributions from the community are welcome! Please check out our `project roadmap <https://github.com/volcengine/verl/issues/710>`_ and `good first issues <https://github.com/volcengine/verl/issues?q=is%3Aissue%20state%3Aopen%20label%3A%22good%20first%20issue%22>`_ to see where you can contribute.

	Code Linting and Formatting
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	We use pre-commit to help improve code quality. To initialize pre-commit, run:

	.. code-block:: bash

	pip install pre-commit
	pre-commit install

	To resolve CI errors locally, you can also manually run pre-commit by:

	.. code-block:: bash

	pre-commit run

	Adding CI tests
	^^^^^^^^^^^^^^^^^^^^^^^^

	If possible, please add CI test(s) for your new feature:

	1. Find the most relevant workflow yml file, which usually corresponds to a ``hydra`` default config (e.g. ``ppo_trainer``, ``ppo_megatron_trainer``, ``sft_trainer``, etc).
	2. Add related path patterns to the ``paths`` section if not already included.
	3. Minimize the workload of the test script(s) (see existing scripts for examples).

	We are HIRING! Send us an `email <mailto:haibin.lin@bytedance.com>`_ if you are interested in internship/FTE opportunities in MLSys/LLM reasoning/multimodal alignment.