Add files using upload-large-folder tool

61ba51e verified about 2 months ago

5.47 kB

	SGLang Documentation
	====================

	.. raw:: html

	<a class="github-button" href="https://github.com/sgl-project/sglang" data-size="large" data-show-count="true" aria-label="Star sgl-project/sglang on GitHub">Star</a>
	<a class="github-button" href="https://github.com/sgl-project/sglang/fork" data-icon="octicon-repo-forked" data-size="large" data-show-count="true" aria-label="Fork sgl-project/sglang on GitHub">Fork</a>
	<script async defer src="https://buttons.github.io/buttons.js"></script>
	<br></br>

	SGLang is a high-performance serving framework for large language models and multimodal models.
	It is designed to deliver low-latency and high-throughput inference across a wide range of setups, from a single GPU to large distributed clusters.
	Its core features include:

	- Fast Runtime: Provides efficient serving with RadixAttention for prefix caching, a zero-overhead CPU scheduler, prefill-decode disaggregation, speculative decoding, continuous batching, paged attention, tensor/pipeline/expert/data parallelism, structured outputs, chunked prefill, quantization (FP4/FP8/INT4/AWQ/GPTQ), and multi-LoRA batching.
	- Broad Model Support: Supports a wide range of language models (Llama, Qwen, DeepSeek, Kimi, GLM, GPT, Gemma, Mistral, etc.), embedding models (e5-mistral, gte, mcdse), reward models (Skywork), and diffusion models (WAN, Qwen-Image), with easy extensibility for adding new models. Compatible with most Hugging Face models and OpenAI APIs.
	- Extensive Hardware Support: Runs on NVIDIA GPUs (GB200/B300/H100/A100/Spark), AMD GPUs (MI355/MI300), Intel Xeon CPUs, Google TPUs, Ascend NPUs, and more.
	- Active Community: SGLang is open-source and supported by a vibrant community with widespread industry adoption, powering over 400,000 GPUs worldwide.
	- RL & Post-Training Backbone: SGLang is a proven rollout backend across the world, with native RL integrations and adoption by well-known post-training frameworks such as AReaL, Miles, slime, Tunix, verl and more.

	.. toctree::
	:maxdepth: 1
	:caption: Get Started

	get_started/install.md

	.. toctree::
	:maxdepth: 1
	:caption: Basic Usage

	basic_usage/send_request.ipynb
	basic_usage/openai_api.rst
	basic_usage/ollama_api.md
	basic_usage/offline_engine_api.ipynb
	basic_usage/native_api.ipynb
	basic_usage/sampling_params.md
	basic_usage/popular_model_usage.rst

	.. toctree::
	:maxdepth: 1
	:caption: Advanced Features

	advanced_features/server_arguments.md
	advanced_features/hyperparameter_tuning.md
	advanced_features/attention_backend.md
	advanced_features/speculative_decoding.ipynb
	advanced_features/structured_outputs.ipynb
	advanced_features/structured_outputs_for_reasoning_models.ipynb
	advanced_features/tool_parser.ipynb
	advanced_features/separate_reasoning.ipynb
	advanced_features/quantization.md
	advanced_features/quantized_kv_cache.md
	advanced_features/expert_parallelism.md
	advanced_features/dp_dpa_smg_guide.md
	advanced_features/lora.ipynb
	advanced_features/pd_disaggregation.md
	advanced_features/epd_disaggregation.md
	advanced_features/pipeline_parallelism.md
	advanced_features/hicache.rst
	advanced_features/pd_multiplexing.md
	advanced_features/vlm_query.ipynb
	advanced_features/dp_for_multi_modal_encoder.md
	advanced_features/cuda_graph_for_multi_modal_encoder.md
	advanced_features/piecewise_cuda_graph.md
	advanced_features/sgl_model_gateway.md
	advanced_features/deterministic_inference.md
	advanced_features/observability.md
	advanced_features/checkpoint_engine.md
	advanced_features/sglang_for_rl.md

	.. toctree::
	:maxdepth: 2
	:caption: Supported Models

	supported_models/text_generation/index
	supported_models/retrieval_ranking/index
	supported_models/specialized/index
	supported_models/extending/index

	.. toctree::
	:maxdepth: 2
	:caption: SGLang Diffusion

	diffusion/index
	diffusion/installation
	diffusion/compatibility_matrix
	diffusion/api/cli
	diffusion/api/openai_api
	diffusion/performance/index
	diffusion/performance/attention_backends
	diffusion/performance/profiling
	diffusion/performance/cache/index
	diffusion/performance/cache/cache_dit
	diffusion/performance/cache/teacache
	diffusion/support_new_models
	diffusion/contributing
	diffusion/ci_perf
	diffusion/environment_variables

	.. toctree::
	:maxdepth: 1
	:caption: Hardware Platforms

	platforms/amd_gpu.md
	platforms/cpu_server.md
	platforms/tpu.md
	platforms/nvidia_jetson.md
	platforms/ascend_npu_support.rst
	platforms/xpu.md

	.. toctree::
	:maxdepth: 1
	:caption: Developer Guide

	developer_guide/contribution_guide.md
	developer_guide/development_guide_using_docker.md
	developer_guide/development_jit_kernel_guide.md
	developer_guide/benchmark_and_profiling.md
	developer_guide/bench_serving.md
	developer_guide/evaluating_new_models.md

	.. toctree::
	:maxdepth: 1
	:caption: References

	references/faq.md
	references/environment_variables.md
	references/production_metrics.md
	references/production_request_trace.md
	references/multi_node_deployment/multi_node_index.rst
	references/custom_chat_template.md
	references/frontend/frontend_index.rst
	references/post_training_integration.md
	references/release_lookup
	references/learn_more.md

	.. toctree::
	:maxdepth: 1
	:caption: Security Acknowledgement

	security/acknowledgements.md