FastVLM_SANA / ml-stable-diffusion /mlx /docs /src /usage /lazy_evaluation.rst

Upload folder using huggingface_hub

712dbf0 verified 4 months ago

4.72 kB

	.. _lazy eval:

	Lazy Evaluation
	===============

	.. currentmodule:: mlx.core

	Why Lazy Evaluation
	-------------------

	When you perform operations in MLX, no computation actually happens. Instead a
	compute graph is recorded. The actual computation only happens if an
	:func:`eval` is performed.

	MLX uses lazy evaluation because it has some nice features, some of which we
	describe below.

	Transforming Compute Graphs
	^^^^^^^^^^^^^^^^^^^^^^^^^^^

	Lazy evaluation lets us record a compute graph without actually doing any
	computations. This is useful for function transformations like :func:`grad` and
	:func:`vmap` and graph optimizations.

	Currently, MLX does not compile and rerun compute graphs. They are all
	generated dynamically. However, lazy evaluation makes it much easier to
	integrate compilation for future performance enhancements.

	Only Compute What You Use
	^^^^^^^^^^^^^^^^^^^^^^^^^

	In MLX you do not need to worry as much about computing outputs that are never
	used. For example:

	.. code-block:: python

	def fun(x):
	a = fun1(x)
	b = expensive_fun(a)
	return a, b

	y, _ = fun(x)

	Here, we never actually compute the output of ``expensive_fun``. Use this
	pattern with care though, as the graph of ``expensive_fun`` is still built, and
	that has some cost associated to it.

	Similarly, lazy evaluation can be beneficial for saving memory while keeping
	code simple. Say you have a very large model ``Model`` derived from
	:obj:`mlx.nn.Module`. You can instantiate this model with ``model = Model()``.
	Typically, this will initialize all of the weights as ``float32``, but the
	initialization does not actually compute anything until you perform an
	:func:`eval`. If you update the model with ``float16`` weights, your maximum
	consumed memory will be half that required if eager computation was used
	instead.

	This pattern is simple to do in MLX thanks to lazy computation:

	.. code-block:: python

	model = Model() # no memory used yet
	model.load_weights("weights_fp16.safetensors")

	When to Evaluate
	----------------

	A common question is when to use :func:`eval`. The trade-off is between
	letting graphs get too large and not batching enough useful work.

	For example:

	.. code-block:: python

	for _ in range(100):
	a = a + b
	mx.eval(a)
	b = b * 2
	mx.eval(b)

	This is a bad idea because there is some fixed overhead with each graph
	evaluation. On the other hand, there is some slight overhead which grows with
	the compute graph size, so extremely large graphs (while computationally
	correct) can be costly.

	Luckily, a wide range of compute graph sizes work pretty well with MLX:
	anything from a few tens of operations to many thousands of operations per
	evaluation should be okay.

	Most numerical computations have an iterative outer loop (e.g. the iteration in
	stochastic gradient descent). A natural and usually efficient place to use
	:func:`eval` is at each iteration of this outer loop.

	Here is a concrete example:

	.. code-block:: python

	for batch in dataset:

	# Nothing has been evaluated yet
	loss, grad = value_and_grad_fn(model, batch)

	# Still nothing has been evaluated
	optimizer.update(model, grad)

	# Evaluate the loss and the new parameters which will
	# run the full gradient computation and optimizer update
	mx.eval(loss, model.parameters())


	An important behavior to be aware of is when the graph will be implicitly
	evaluated. Anytime you ``print`` an array, convert it to an
	:obj:`numpy.ndarray`, or otherwise access its memory via :obj:`memoryview`,
	the graph will be evaluated. Saving arrays via :func:`save` (or any other MLX
	saving functions) will also evaluate the array.


	Calling :func:`array.item` on a scalar array will also evaluate it. In the
	example above, printing the loss (``print(loss)``) or adding the loss scalar to
	a list (``losses.append(loss.item())``) would cause a graph evaluation. If
	these lines are before ``mx.eval(loss, model.parameters())`` then this
	will be a partial evaluation, computing only the forward pass.

	Also, calling :func:`eval` on an array or set of arrays multiple times is
	perfectly fine. This is effectively a no-op.

	.. warning::

	Using scalar arrays for control-flow will cause an evaluation.

	Here is an example:

	.. code-block:: python

	def fun(x):
	h, y = first_layer(x)
	if y > 0: # An evaluation is done here!
	z = second_layer_a(h)
	else:
	z = second_layer_b(h)
	return z

	Using arrays for control flow should be done with care. The above example works
	and can even be used with gradient transformations. However, this can be very
	inefficient if evaluations are done too frequently.