| Metal Debugger | |
| ============== | |
| .. currentmodule:: mlx.core | |
| Profiling is a key step for performance optimization. You can build MLX with | |
| the ``MLX_METAL_DEBUG`` option to improve the Metal debugging and | |
| optimization workflow. The ``MLX_METAL_DEBUG`` debug option: | |
| * Records source during Metal compilation, for later inspection while | |
| debugging. | |
| * Labels Metal objects such as command queues, improving capture readability. | |
| To build with debugging enabled in Python prepend | |
| ``CMAKE_ARGS="-DMLX_METAL_DEBUG=ON"`` to the build call. | |
| The :func:`metal.start_capture` function initiates a capture of all MLX GPU | |
| work. | |
| .. note:: | |
| To capture a GPU trace you must run the application with | |
| ``MTL_CAPTURE_ENABLED=1``. | |
| .. code-block:: python | |
| import mlx.core as mx | |
| a = mx.random.uniform(shape=(512, 512)) | |
| b = mx.random.uniform(shape=(512, 512)) | |
| mx.eval(a, b) | |
| trace_file = "mlx_trace.gputrace" | |
| # Make sure to run with MTL_CAPTURE_ENABLED=1 and | |
| # that the path trace_file does not already exist. | |
| mx.metal.start_capture(trace_file) | |
| for _ in range(10): | |
| mx.eval(mx.add(a, b)) | |
| mx.metal.stop_capture() | |
| You can open and replay the GPU trace in Xcode. The ``Dependencies`` view | |
| has a great overview of all operations. Checkout the `Metal debugger | |
| documentation`_ for more information. | |
| .. image:: ../_static/metal_debugger/capture.png | |
| :class: dark-light | |
| Xcode Workflow | |
| -------------- | |
| You can skip saving to a path by running within Xcode. First, generate an | |
| Xcode project using CMake. | |
| .. code-block:: | |
| mkdir build && cd build | |
| cmake .. -DMLX_METAL_DEBUG=ON -G Xcode | |
| open mlx.xcodeproj | |
| Select the ``metal_capture`` example schema and run. | |
| .. image:: ../_static/metal_debugger/schema.png | |
| :class: dark-light | |
| .. _`Metal debugger documentation`: https://developer.apple.com/documentation/xcode/metal-debugger | |