File size: 10,581 Bytes
66c9c8a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
Runtime Settings
================

Warp has settings at the global, module, and kernel level that can be used to fine-tune the compilation and verbosity
of Warp programs. In cases in which a setting can be changed at multiple levels (e.g ``enable_backward``),
the setting at the more-specific scope takes precedence.

Global Settings
---------------

To change a setting, prepend ``wp.config.`` to the name of the variable and assign a value to it.
Some settings may be changed on the fly, while others need to be set prior to calling ``wp.init()`` to take effect.

For example, the location of the user kernel cache can be changed with:

.. code-block:: python

    import os

    import warp as wp

    example_dir = os.path.dirname(os.path.realpath(__file__))

    # set default cache directory before wp.init()
    wp.config.kernel_cache_dir = os.path.join(example_dir, "tmp", "warpcache1")

    wp.init()


Basic Global Settings
^^^^^^^^^^^^^^^^^^^^^

+--------------------+---------+-------------+--------------------------------------------------------------------------+
| Field              | Type    |Default Value| Description                                                              |
+====================+=========+=============+==========================================================================+
|``verify_fp``       | Boolean | ``False``   | If ``True``, Warp will check that inputs and outputs are finite before   |
|                    |         |             | and/or after various operations. **Has performance implications.**       |
+--------------------+---------+-------------+--------------------------------------------------------------------------+
|``verify_cuda``     | Boolean | ``False``   | If ``True``, Warp will check for CUDA errors after every launch and      |
|                    |         |             | memory operation. CUDA error verification cannot be used during graph    |
|                    |         |             | capture. **Has performance implications.**                               |              
+--------------------+---------+-------------+--------------------------------------------------------------------------+
|``print_launches``  | Boolean | ``False``   | If ``True``, Warp will print details of every kernel launch to standard  |
|                    |         |             | out (e.g. launch dimensions, inputs, outputs, device, etc.).             |
|                    |         |             | **Has performance implications.**                                        |
+--------------------+---------+-------------+--------------------------------------------------------------------------+
|``mode``            | String  |``"release"``| Controls whether to compile Warp kernels in debug or release mode.       |
|                    |         |             | Valid choices are ``"release"`` or ``"debug"``.                          |
|                    |         |             | **Has performance implications.**                                        |
+--------------------+---------+-------------+--------------------------------------------------------------------------+
|``verbose``         | Boolean | ``False``   | If ``True``, additional information will be printed to standard out      |
|                    |         |             | during code generation, compilation, etc.                                |
+--------------------+---------+-------------+--------------------------------------------------------------------------+
|``quiet``           | Boolean | ``False``   | If ``True``, Warp module initialization messages will be disabled.       |
|                    |         |             | This setting does not affect error messages and warnings.                |
+--------------------+---------+-------------+--------------------------------------------------------------------------+
|``kernel_cache_dir``| String  | ``None``    | The path to the directory used for the user kernel cache. Subdirectories |
|                    |         |             | named ``gen`` and ``bin`` will be created in this directory. If ``None``,|
|                    |         |             | a directory will be automatically determined using                       |
|                    |         |             | `appdirs.user_cache_directory <https://github.com/ActiveState/appdirs>`_ |
|                    |         |             |                                                                          |
+--------------------+---------+-------------+--------------------------------------------------------------------------+
|``enable_backward`` | Boolean | ``True``    | If ``True``, backward passes of kernels will be compiled by default.     |
|                    |         |             | Disabling this setting can reduce kernel compilation times.              |
+--------------------+---------+-------------+--------------------------------------------------------------------------+

Advanced Global Settings
^^^^^^^^^^^^^^^^^^^^^^^^

+--------------------+---------+-------------+--------------------------------------------------------------------------+
| Field              | Type    |Default Value| Description                                                              |
+====================+=========+=============+==========================================================================+
|``cache_kernels``   | Boolean | ``True``    | If ``True``, kernels that have already been compiled from previous       |
|                    |         |             | application launches will not be recompiled.                             |
+--------------------+---------+-------------+--------------------------------------------------------------------------+
|``cuda_output``     | String  | ``None``    | The preferred CUDA output format for kernels. Valid choices are ``None``,|
|                    |         |             | ``"ptx"``, and ``"cubin"``. If ``None``, a format will be determined     |
|                    |         |             | automatically.                                                           |
+--------------------+---------+-------------+--------------------------------------------------------------------------+
|``ptx_target_arch`` | Integer | 70          | The target architecture for PTX generation.                              |
+--------------------+---------+-------------+--------------------------------------------------------------------------+
|``llvm_cuda``       | Boolean | ``False``   | If ``True``, Clang/LLVM will be used to compile CUDA code instead of     |
|                    |         |             | NVTRC.                                                                   |
+--------------------+---------+-------------+--------------------------------------------------------------------------+

Module Settings
---------------

Module-level settings to control runtime compilation and code generation may be changed by passing a dictionary of
option pairs to ``wp.set_module_options()``.

For example, compilation of backward passes for the kernel in an entire module can be disabled with:

.. code:: python

    wp.set_module_options({"enable_backward": False})

The options for a module can also be queried using ``wp.get_module_options()``.

+--------------------+---------+-------------+--------------------------------------------------------------------------+
| Field              | Type    |Default Value| Description                                                              |
+====================+=========+=============+==========================================================================+
|``mode``            | String  | Global      | Controls whether to compile the module's kernels in debug or release     |
|                    |         | setting     | mode by default. Valid choices are ``"release"`` or ``"debug"``.         |
+--------------------+---------+-------------+--------------------------------------------------------------------------+
|``max_unroll``      | Integer | 16          | The maximum fixed-size loop to unroll. Note that ``max_unroll`` does not |
|                    |         |             | consider the total number of iterations in nested loops. This can result |
|                    |         |             | in a large amount of automatically generated code if each nested loop is |
|                    |         |             | below the ``max_unroll`` threshold.                                      |
+--------------------+---------+-------------+--------------------------------------------------------------------------+
|``enable_backward`` | Boolean | Global      | If ``True``, backward passes of kernels will be compiled by default.     |
|                    |         | setting     | Valid choices are ``"release"`` or ``"debug"``.                          |
+--------------------+---------+-------------+--------------------------------------------------------------------------+
|``fast_math``       | Boolean | ``False``   | If ``True``, CUDA kernels will be compiled with the ``--use_fast_math``  |
|                    |         |             | compiler option, which enables some fast math operations that are faster |
|                    |         |             | but less accurate.                                                       |
+--------------------+---------+-------------+--------------------------------------------------------------------------+
|``cuda_output``     | String  | ``None``    | The preferred CUDA output format for kernels. Valid choices are ``None``,|
|                    |         |             | ``"ptx"``, and ``"cubin"``. If ``None``, a format will be determined     |
|                    |         |             | automatically. The module-level setting takes precedence over the global |
|                    |         |             | setting.                                                                 |
+--------------------+---------+-------------+--------------------------------------------------------------------------+

Kernel Settings
---------------

``enable_backward`` is currently the only setting that can also be configured on a per-kernel level.
Backward-pass compilation can be disabled by passing an argument into the ``@wp.kernel`` decorator
as in the following example:

.. code-block:: python

    @wp.kernel(enable_backward=False)
    def scale_2(
        x: wp.array(dtype=float),
        y: wp.array(dtype=float),
    ):
        y[0] = x[0] ** 2.0