Add files using upload-large-folder tool

61ba51e verified about 1 month ago

5.89 kB

	# Run Unit Tests

	SGLang uses the built-in library [unittest](https://docs.python.org/3/library/unittest.html) as the testing framework.

	## Test Backend Runtime
	```bash
	cd sglang/test/srt

	# Run a single file
	python3 test_srt_endpoint.py

	# Run a single test
	python3 test_srt_endpoint.py TestSRTEndpoint.test_simple_decode

	# Run a suite with multiple files
	python3 run_suite.py --suite per-commit
	```

	## Test Frontend Language
	```bash
	cd sglang/test/lang

	# Run a single file
	python3 test_choices.py
	```

	## Adding or Updating Tests in CI

	- Create new test files under `test/srt` or `test/lang` depending on the type of test.
	- For nightly tests, place them in `test/srt/nightly/`. Use the `NightlyBenchmarkRunner` helper class in `nightly_utils.py` for performance benchmarking tests.
	- Ensure they are referenced in the respective `run_suite.py` (e.g., `test/srt/run_suite.py`) so they are picked up in CI. For most small test cases, they can be added to the `per-commit-1-gpu` suite. Sort the test cases alphabetically by name.
	- Ensure you added `unittest.main()` for unittest and `sys.exit(pytest.main([__file__]))` for pytest in the scripts. The CI run them via `python3 test_file.py`.
	- The CI will run some suites such as `per-commit-1-gpu`, `per-commit-2-gpu`, and `nightly-1-gpu` automatically. If you need special setup or custom test groups, you may modify the workflows in [`.github/workflows/`](https://github.com/sgl-project/sglang/tree/main/.github/workflows).

	## CI Registry System

	Tests in `test/registered/` use a registry-based CI system for flexible backend/schedule configuration.

	### Registration Functions

	```python
	from sglang.test.ci.ci_register import (
	register_cuda_ci,
	register_amd_ci,
	register_cpu_ci,
	register_npu_ci,
	)

	# Per-commit test (small 1-gpu, runs on 5090)
	register_cuda_ci(est_time=80, suite="stage-b-test-small-1-gpu")

	# Per-commit test (large 1-gpu, runs on H100)
	register_cuda_ci(est_time=120, suite="stage-b-test-large-1-gpu")

	# Per-commit test (2-gpu)
	register_cuda_ci(est_time=200, suite="stage-b-test-large-2-gpu")

	# Nightly-only test
	register_cuda_ci(est_time=200, suite="nightly-1-gpu", nightly=True)

	# Multi-backend test
	register_cuda_ci(est_time=80, suite="stage-b-test-small-1-gpu")
	register_amd_ci(est_time=120, suite="stage-a-test-1")

	# Temporarily disabled test
	register_cuda_ci(est_time=80, suite="stage-b-test-small-1-gpu", disabled="flaky - see #12345")
	```

	### Choosing Between 1-GPU Suites (5090 vs H100)

	When adding 1-GPU tests, choose the appropriate suite based on hardware compatibility:

	\| Suite \| Runner \| GPU \| When to Use \|
	\|-------\|--------\|-----\|-------------\|
	\| `stage-b-test-small-1-gpu` \| `1-gpu-5090` \| RTX 5090 (32GB, SM120) \| 5090-compatible tests (preferred) \|
	\| `stage-b-test-large-1-gpu` \| `1-gpu-runner` \| H100 (80GB, SM90) \| Large models or 5090-incompatible tests \|

	Use `stage-b-test-small-1-gpu` (5090) whenever possible - this is the preferred suite for most 1-GPU tests.

	Use `stage-b-test-large-1-gpu` (H100) if ANY of these apply:

	1. Architecture incompatibility (SM120/Blackwell):
	- FA3 attention backend (requires SM≤90)
	- MLA with FA3 backend
	- FP8/MXFP4 quantization (not supported on SM120)
	- Certain Triton kernels (shared memory limits)

	2. Memory requirements:
	- Models >30B params or large MoE
	- Tests requiring >32GB VRAM

	3. Known 5090 failures:
	- Weight update/sync tests
	- Certain spec decoding tests

	If a test cannot run on 5090 due to any of the above, use `stage-b-test-large-1-gpu` which runs on H100.

	### Available Suites

	Per-Commit (CUDA):
	- Stage A: `stage-a-test-1` (locked), `stage-a-test-2`, `stage-a-test-cpu`
	- Stage B: `stage-b-test-small-1-gpu` (5090), `stage-b-test-large-1-gpu` (H100), `stage-b-test-large-2-gpu`
	- Stage C (4-GPU): `stage-c-test-4-gpu-h100`, `stage-c-test-4-gpu-b200`, `stage-c-test-4-gpu-gb200`, `stage-c-test-deepep-4-gpu`
	- Stage C (8-GPU): `stage-c-test-8-gpu-h20`, `stage-c-test-8-gpu-h200`, `stage-c-test-8-gpu-b200`, `stage-c-test-deepep-8-gpu-h200`

	Per-Commit (AMD):
	- `stage-a-test-1`, `stage-b-test-small-1-gpu-amd`, `stage-b-test-large-2-gpu-amd`

	Nightly:
	- `nightly-1-gpu`, `nightly-2-gpu`, `nightly-4-gpu`, `nightly-8-gpu`, etc.

	### Running Tests with run_suite.py

	```bash
	# Run per-commit tests
	python test/run_suite.py --hw cuda --suite stage-b-test-small-1-gpu

	# Run nightly tests
	python test/run_suite.py --hw cuda --suite nightly-1-gpu --nightly

	# With auto-partitioning (for parallel CI jobs)
	python test/run_suite.py --hw cuda --suite stage-b-test-small-1-gpu \
	--auto-partition-id 0 --auto-partition-size 4
	```

	## Writing Elegant Test Cases

	- Learn from existing examples in [sglang/test/srt](https://github.com/sgl-project/sglang/tree/main/test/srt).
	- Reduce the test time by using smaller models and reusing the server for multiple test cases. Launching a server takes a lot of time.
	- Use as few GPUs as possible. Do not run long tests with 8-gpu runners.
	- If the test cases take too long, considering adding them to nightly tests instead of per-commit tests.
	- Keep each test function focused on a single scenario or piece of functionality.
	- Give tests descriptive names reflecting their purpose.
	- Use robust assertions (e.g., assert, unittest methods) to validate outcomes.
	- Clean up resources to avoid side effects and preserve test independence.
	- Reduce the test time by using smaller models and reusing the server for multiple test cases.


	## Adding New Models to Nightly CI
	- For text models: extend [global model lists variables](https://github.com/sgl-project/sglang/blob/85c1f7937781199203b38bb46325a2840f353a04/python/sglang/test/test_utils.py#L104) in `test_utils.py`, or add more model lists
	- For vlms: extend the `MODEL_THRESHOLDS` global dictionary in `test/srt/nightly/test_vlms_mmmu_eval.py`