Improved TritonBench evaluation framework
Update
- The integration of correctness checking inside the performance evaluation suite does not change the command in any way, i.e. you can still continue to use the following evaluation instructions.
- This integration changes that TBG does correctness and evaluation inside the
evaluatefunction called inrun.py. However, since ROCm interface has not changed, we conditionally do performance evaluation of ROCm inrun.py.
Dependancy installation
- Install requirements as
pip install -r requirements.txt
Installation
Please install running the following command from the root folder:
pip install -e .
Please note that installation does not automatically install dependancies. You must install the dependancies before installing the package.
Running evaluation
You can run evaluations in the following two ways:
- Command line run:
tb_eval -f PATH_TO_FOLDER_OR_FILE -o NAME_OF_OUTPUT_FILE -ds tbgfor Tritonbench-G-v1tb_eval -f PATH_TO_FOLDER_OR_FILE -o NAME_OF_OUTPUT_FILE -ds rocmfor ROCm
- From python script: the following is a bare minimum example, for a detail example please see
tb_eval/run.py.from tb_eval.evaluators.interface import get_evaluatorsevaluator = get_evaluators["tbg"]() # for TritonBenchG evalevaluator = get_evaluators["rocm"]() # for ROCm evalcall_status, exec_status, stdout, stderr = evaluator(generated_code, log_root=PATH_TO_LOG, file_name="kernel.py", atol=1e-5, rtol=1e-2) # run evaluations
Issues with existing TritonBench evaluation framework
1_exec_acc.pyfile in TritonBench did not accurately compare the outputs of two Triton files.- The execution was purely done using subprocess call for both generated and ground truth files.
- The seed consistancy is violated.
- The outputs of the two Triton runs are compared using stdout string comparison, which is not always correct.
- Around ground truth 150 files do not
print(result_gold)line, hence the eval framework is essentially comapring the two null strings. - Some of the ground truth files (e.g.
context_attn_bloom.py) does not even haveresult_gold = test_*()line at the end. So the call accuracy run using this file0_call_acc.pyjust blindly assumes that the call was success. - 7 kernel files (originally provided) run into
memory access faults, we have fixed them.
We have fixed these issues as follows:
- Use
torch.allcloseto compare two runs (ground truth and generated). - Fix ground truth files to include
result_gold = test_*(). - Ensure consistent seed across files.
- Integrated the correctness checks inside the performance evaluation suite. This is due to large number of unit tests available in performance suite and speedup must be computed by re-evaluating ground truth kernel.
We have also integrated performance measurement into the framework. Kernel evaluation flow is as follows:
- Check if the kernel is callable: run the test function of the kernel.
- If the kernel is callable then check if the kernel matches ground truth by comparing outputs of the generated kernel on known tests.
- If the generated kernel is correct: run the performance evaluation.
Help/support/contribute:
Please raise github issue or PR for any issues/help or contributions!
You can contribute in the following ways:
- Add new kernels for evaluations:
- Add the dataset of new kernels under
tb_eval/data. - Add the path of this new dataset in
tb_eval.constants. - Add an evaluator interface for this new dataset in
tb_eval.evaluators.interface. - Add an evaluator to be run by the interface in
tb_eval.evaluators. The evaluator is a function that only runs python call and does not run if imported as a module. Theevaluator(e.g.TB_correctness.py) is run by itsinterface(e.g.interface.TestAllCloseEvaluatorTBG).
- Add the dataset of new kernels under
- You can add new metrics for evaluator to work with in
tb_eval.metrics. - You can add new performance eval metrics for your (or existing) dataset under
tb_eval.perf.
Updates
- [2025-07-16] Added autotune compatible ROCm kernels and naive softmax, use
-tpargument with path to this folder as below:tb_eval eval -f PATH_TO_EVAL_FOLDER -o RESULT_NAME -ds rocm -tp tb_eval/data/ROCm/data/ROCm_v1_autotunenaive_softmax.pykernel from rocm blog is added to this repo.- Use
-cargument to directly run evaluations on python triton code file(s)/folder instead of json-based parsing.
Credits:
Our repo has found the following repos as helpful: