Spaces:
Sleeping
Sleeping
| FAQ | |
| === | |
| How does Warp relate to other Python projects for GPU programming, e.g.: Numba, Taichi, cuPy, PyTorch, etc.? | |
| ------------------------------------------------------------------------------------------------------------ | |
| Warp is inspired by many of these projects, and is closely related to | |
| Numba and Taichi which both expose kernel programming to Python. These | |
| frameworks map to traditional GPU programming models, so many of the | |
| high-level concepts are similar, however there are some functionality | |
| and implementation differences. | |
| Compared to Numba, Warp supports a smaller subset of Python, but | |
| offering auto-differentiation of kernel programs, which is useful for | |
| machine learning. Compared to Taichi Warp uses C++/CUDA as an | |
| intermediate representation, which makes it convenient to implement and | |
| expose low-level routines. In addition, we are building in | |
| data structures to support geometry processing (meshes, sparse volumes, | |
| point clouds, USD data) as first-class citizens that are not exposed in | |
| other runtimes. | |
| Warp does not offer a full tensor-based programming model like PyTorch | |
| and JAX, but is designed to work well with these frameworks through data | |
| sharing mechanisms like ``__cuda_array_interface__``. For computations | |
| that map well to tensors (e.g.: neural-network inference) it makes sense | |
| to use these existing tools. For problems with a lot of e.g.: sparsity, | |
| conditional logic, heterogenous workloads (like the ones we often find in | |
| simulation and graphics), then the kernel-based programming model like | |
| the one in Warp are often more convenient since users have control over | |
| individual threads. | |
| Does Warp support all of the Python language? | |
| --------------------------------------------- | |
| No, Warp supports a subset of Python that maps well to the GPU. Our goal | |
| is to not have any performance cliffs so that users can expect | |
| consistently good behavior from kernels that is close to native code. | |
| Examples of unsupported concepts that don’t map well to the GPU are | |
| dynamic types, list comprehensions, exceptions, garbage collection, etc. | |
| When should I call ``wp.synchronize()``? | |
| ---------------------------------------- | |
| One of the common sources of confusion for new users is when calls to | |
| ``wp.synchronize()`` are necessary. The answer is “almost never”! | |
| Synchronization is quite expensive, and should generally be avoided | |
| unless necessary. Warp naturally takes care of synchronization between | |
| operations (e.g.: kernel launches, device memory copies). | |
| For example, the following requires no manual synchronization, as the | |
| conversion to NumPy will automatically synchronize: | |
| .. code:: python | |
| # run some kernels | |
| wp.launch(kernel_1, dim, [array_x, array_y], device="cuda") | |
| wp.launch(kernel_2, dim, [array_y, array_z], device="cuda") | |
| # bring data back to host (and implicitly synchronize) | |
| x = array_z.numpy() | |
| The *only* case where manual synchronization is needed is when copies | |
| are being performed back to CPU asynchronously, e.g.: | |
| .. code:: python | |
| # copy data back to cpu from gpu, all copies will happen asynchronously to Python | |
| wp.copy(cpu_array_1, gpu_array_1) | |
| wp.copy(cpu_array_2, gpu_array_2) | |
| wp.copy(cpu_array_3, gpu_array_3) | |
| # ensure that the copies have finished | |
| wp.synchronize() | |
| # return a numpy wrapper around the cpu arrays, note there is no implicit synchronization here | |
| a1 = cpu_array_1.numpy() | |
| a2 = cpu_array_2.numpy() | |
| a3 = cpu_array_3.numpy() | |
| What happens when you differentiate a function like ``wp.abs(x)``? | |
| ------------------------------------------------------------------ | |
| Non-smooth functions such as ``y=|x|`` do not have a single unique | |
| gradient at ``x=0``, rather they have what is known as a | |
| ``subgradient``, which is formally the convex hull of directional | |
| derivatives at that point. The way that Warp (and most | |
| auto-differentiation frameworks) handles these points is to pick an | |
| arbitrary gradient from this set, e.g.: for ``wp.abs()``, it will | |
| arbitrarily choose the gradient to be 1.0 at the origin. You can find | |
| the implementation for these functions in ``warp/native/builtin.h``. | |
| Most optimizers (particularly ones that exploit stochasticity), are not | |
| sensitive to the choice of which gradient to use from the subgradient, | |
| although there are exceptions. | |
| Does Warp support multi-GPU programming? | |
| ---------------------------------------- | |
| Yes! Since version ``0.4.0`` we support allocating, launching, and | |
| copying between multiple GPUs in a single process. We follow the naming | |
| conventions of PyTorch and use aliases such as ``cuda:0``, ``cuda:1``, | |
| ``cpu`` to identify individual devices. | |
| Should I switch to Warp over IsaacGym / PhysX? | |
| ---------------------------------------------- | |
| Warp is not a replacement for IsaacGym, IsaacSim, or PhysX - while Warp | |
| does offer some physical simulation capabilities this is primarily aimed | |
| at developers who need differentiable physics, rather than a fully | |
| featured physics engine. Warp is also integrated with IsaacGym and is | |
| great for performing auxiliary tasks such as reward and observation | |
| computations for reinforcement learning. | |