Spaces:

qbhf2
/

GarmentCode

Sleeping

App Files Files Community

GarmentCode / NvidiaWarp-GarmentCode /CHANGELOG.md

qbhf2

added NvidiaWarp and GarmentCode repos

66c9c8a 11 months ago

preview code

raw

history blame contribute delete

40.3 kB

	# CHANGELOG

	## [1.0.0-beta.6] - 2024-01-10

	- Do not create CPU copy of grad array when calling `array.numpy()`
	- Fix `assert_np_equal()` bug
	- Support Linux AArch64 platforms, including Jetson/Tegra devices
	- Add parallel testing runner (invoke with `python -m warp.tests`, use `warp/tests/unittest_serial.py` for serial testing)
	- Fix support for function calls in `range()`
	- `matmul` adjoints now accumulate
	- Expand available operators (e.g. vector @ matrix, scalar as dividend) and improve support for calling native built-ins
	- Fix multi-gpu synchronization issue in `sparse.py`
	- Add depth rendering to `OpenGLRenderer`, document `warp.render`
	- Make `atomic_min`, `atomic_max` differentiable
	- Fix error reporting using the exact source segment
	- Add user-friendly mesh query overloads, returning a struct instead of overwriting parameters
	- Address multiple differentiability issues
	- Fix backpropagation for returning array element references
	- Support passing the return value to adjoints
	- Add point basis space and explicit point-based quadrature for `warp.fem`
	- Support overriding the LLVM project source directory path using `build_lib.py --build_llvm --llvm_source_path=`
	- Fix the error message for accessing non-existing attributes
	- Flatten faces array for Mesh constructor in URDF parser

	## [1.0.0-beta.5] - 2023-11-22

	- Fix for kernel caching when function argument types change
	- Fix code-gen ordering of dependent structs
	- Fix for `wp.Mesh` build on MGPU systems
	- Fix for name clash bug with adjoint code: https://github.com/NVIDIA/warp/issues/154
	- Add `wp.frac()` for returning the fractional part of a floating point value
	- Add support for custom native CUDA snippets using `@wp.func_native` decorator
	- Add support for batched matmul with batch size > 2^16-1
	- Add support for transposed CUTLASS `wp.matmul()` and additional error checking
	- Add support for quad and hex meshes in `wp.fem`
	- Detect and warn when C++ runtime doesn't match compiler during build, e.g.: ``libstdc++.so.6: version `GLIBCXX_3.4.30' not found``
	- Documentation update for `wp.BVH`
	- Documentation and simplified API for runtime kernel specialization `wp.Kernel`

	## [1.0.0-beta.4] - 2023-11-01

	- Add `wp.cbrt()` for cube root calculation
	- Add `wp.mesh_furthest_point_no_sign()` to compute furthest point on a surface from a query point
	- Add support for GPU BVH builds, 10-100x faster than CPU builds for large meshes
	- Add support for chained comparisons, i.e.: `0 < x < 2`
	- Add support for running `warp.fem` examples headless
	- Fix for unit test determinism
	- Fix for possible GC collection of array during graph capture
	- Fix for `wp.utils.array_sum()` output initialization when used with vector types
	- Coverage and documentation updates

	## [1.0.0-beta.3] - 2023-10-19

	- Add support for code coverage scans (test_coverage.py), coverage at 85% in omni.warp.core
	- Add support for named component access for vector types, e.g.: `a = v.x`
	- Add support for lvalue expressions, e.g.: `array[i] += b`
	- Add casting constructors for matrix and vector types
	- Add support for `type()` operator that can be used to return type inside kernels
	- Add support for grid-stride kernels to support kernels with > 2^31-1 thread blocks
	- Fix for multi-process initialization warnings
	- Fix alignment issues with empty `wp.struct`
	- Fix for return statement warning with tuple-returning functions
	- Fix for `wp.batched_matmul()` registering the wrong function in the Tape
	- Fix and document for `wp.sim` forward + inverse kinematics
	- Fix for `wp.func` to return a default value if function does not return on all control paths
	- Refactor `wp.fem` support for new basis functions, decoupled function spaces
	- Optimizations for `wp.noise` functions, up to 10x faster in most cases
	- Optimizations for `type_size_in_bytes()` used in array construction'

	### Breaking Changes

	- To support grid-stride kernels, `wp.tid()` can no longer be called inside `wp.func` functions.

	## [1.0.0-beta.2] - 2023-09-01

	- Fix for passing bool into `wp.func` functions
	- Fix for deprecation warnings appearing on `stderr`, now redirected to `stdout`
	- Fix for using `for i in wp.hash_grid_query(..)` syntax

	## [1.0.0-beta.1] - 2023-08-29

	- Fix for `wp.float16` being passed as kernel arguments
	- Fix for compile errors with kernels using structs in backward pass
	- Fix for `wp.Mesh.refit()` not being CUDA graph capturable due to synchronous temp. allocs
	- Fix for dynamic texture example flickering / MGPU crashes demo in Kit by reusing `ui.DynamicImageProvider` instances
	- Fix for a regression that disabled bundle change tracking in samples
	- Fix for incorrect surface velocities when meshes are deforming in `OgnClothSimulate`
	- Fix for incorrect lower-case when setting USD stage "up_axis" in examples
	- Fix for incompatible gradient types when wrapping PyTorch tensor as a vector or matrix type
	- Fix for adding open edges when building cloth constraints from meshes in `wp.sim.ModelBuilder.add_cloth_mesh()`
	- Add support for `wp.fabricarray` to directly access Fabric data from Warp kernels, see https://omniverse.gitlab-master-pages.nvidia.com/usdrt/docs/usdrt_prim_selection.html for examples
	- Add support for user defined gradient functions, see `@wp.func_replay`, and `@wp.func_grad` decorators
	- Add support for more OG attribute types in `omni.warp.from_omni_graph()`
	- Add support for creating NanoVDB `wp.Volume` objects from dense NumPy arrays
	- Add support for `wp.volume_sample_grad_f()` which returns the value + gradient efficiently from an NVDB volume
	- Add support for LLVM fp16 intrinsics for half-precision arithmetic
	- Add implementation of stochastic gradient descent, see `wp.optim.SGD`
	- Add `warp.fem` framework for solving weak-form PDE problems (see https://nvidia.github.io/warp/_build/html/modules/fem.html)
	- Optimizations for `omni.warp` extension load time (2.2s to 625ms cold start)
	- Make all `omni.ui` dependencies optional so that Warp unit tests can run headless
	- Deprecation of `wp.tid()` outside of kernel functions, users should pass `tid()` values to `wp.func` functions explicitly
	- Deprecation of `wp.sim.Model.flatten()` for returning all contained tensors from the model
	- Add support for clamping particle max velocity in `wp.sim.Model.particle_max_velocity`
	- Remove dependency on `urdfpy` package, improve MJCF parser handling of default values

	## [0.10.1] - 2023-07-25

	- Fix for large multidimensional kernel launches (> 2^32 threads)
	- Fix for module hashing with generics
	- Fix for unrolling loops with break or continue statements (will skip unrolling)
	- Fix for passing boolean arguments to build_lib.py (previously ignored)
	- Fix build warnings on Linux
	- Fix for creating array of structs from NumPy structured array
	- Fix for regression on kernel load times in Kit when using warp.sim
	- Update `warp.array.reshape()` to handle `-1` dimensions
	- Update margin used by for mesh queries when using `wp.sim.create_soft_body_contacts()`
	- Improvements to gradient handling with `warp.from_torch()`, `warp.to_torch()` plus documentation

	## [0.10.0] - 2023-07-05

	- Add support for macOS universal binaries (x86 + aarch64) for M1+ support
	- Add additional methods for SDF generation please see the following new methods:
	- `wp.mesh_query_point_nosign()` - closest point query with no sign determination
	- `wp.mesh_query_point_sign_normal()` - closest point query with sign from angle-weighted normal
	- `wp.mesh_query_point_sign_winding_number()` - closest point query with fast winding number sign determination
	- Add CSR/BSR sparse matrix support, see `warp.sparse` module:
	- `wp.sparse.BsrMatrix`
	- `wp.sparse.bsr_zeros()`, `wp.sparse.bsr_set_from_triplets()` for construction
	- `wp.sparse.bsr_mm()`, `wp.sparse_bsr_mv()` for matrix-matrix and matrix-vector products respectively
	- Add array-wide utilities:
	- `wp.utils.array_scan()` - prefix sum (inclusive or exclusive)
	- `wp.utils.array_sum()` - sum across array
	- `wp.utils.radix_sort_pairs()` - in-place radix sort (key,value) pairs
	- Add support for calling `@wp.func` functions from Python (outside of kernel scope)
	- Add support for recording kernel launches using a `wp.Launch` object that can be replayed with low overhead, use `wp.launch(..., record_cmd=True)` to generate a command object
	- Optimizations for `wp.struct` kernel arguments, up to 20x faster launches for kernels with large structs or number of params
	- Refresh USD samples to use bundle based workflow + change tracking
	- Add Python API for manipulating mesh and point bundle data in OmniGraph, see `omni.warp.nodes` module
	- See `omni.warp.nodes.mesh_create_bundle()`, `omni.warp.nodes.mesh_get_points()`, etc.
	- Improvements to `wp.array`:
	- Fix a number of array methods misbehaving with empty arrays
	- Fix a number of bugs and memory leaks related to gradient arrays
	- Fix array construction when creating arrays in pinned memory from a data source in pageable memory
	- `wp.empty()` no longer zeroes-out memory and returns an uninitialized array, as intended
	- `array.zero_()` and `array.fill_()` work with non-contiguous arrays
	- Support wrapping non-contiguous NumPy arrays without a copy
	- Support preserving the outer dimensions of NumPy arrays when wrapping them as Warp arrays of vector or matrix types
	- Improve PyTorch and DLPack interop with Warp arrays of arbitrary vectors and matrices
	- `array.fill_()` can now take lists or other sequences when filling arrays of vectors or matrices, e.g. `arr.fill_([[1, 2], [3, 4]])`
	- `array.fill_()` now works with arrays of structs (pass a struct instance)
	- `wp.copy()` gracefully handles copying between non-contiguous arrays on different devices
	- Add `wp.full()` and `wp.full_like()`, e.g., `a = wp.full(shape, value)`
	- Add optional `device` argument to `wp.empty_like()`, `wp.zeros_like()`, `wp.full_like()`, and `wp.clone()`
	- Add `indexedarray` methods `.zero_()`, `.fill_()`, and `.assign()`
	- Fix `indexedarray` methods `.numpy()` and `.list()`
	- Fix `array.list()` to work with arrays of any Warp data type
	- Fix `array.list()` synchronization issue with CUDA arrays
	- `array.numpy()` called on an array of structs returns a structured NumPy array with named fields
	- Improve the performance of creating arrays
	- Fix for `Error: No module named 'omni.warp.core'` when running some Kit configurations (e.g.: stubgen)
	- Fix for `wp.struct` instance address being included in module content hash
	- Fix codegen with overridden function names
	- Fix for kernel hashing so it occurs after code generation and before loading to fix a bug with stale kernel cache
	- Fix for `wp.BVH.refit()` when executed on the CPU
	- Fix adjoint of `wp.struct` constructor
	- Fix element accessors for `wp.float16` vectors and matrices in Python
	- Fix `wp.float16` members in structs
	- Remove deprecated `wp.ScopedCudaGuard()`, please use `wp.ScopedDevice()` instead

	## [0.9.0] - 2023-06-01

	- Add support for in-place modifications to vector, matrix, and struct types inside kernels (will warn during backward pass with `wp.verbose` if using gradients)
	- Add support for step-through VSCode debugging of kernel code with standalone LLVM compiler, see `wp.breakpoint()`, and `walkthrough_debug.py`
	- Add support for default values on built-in functions
	- Add support for multi-valued `@wp.func` functions
	- Add support for `pass`, `continue`, and `break` statements
	- Add missing `__sincos_stret` symbol for macOS
	- Add support for gradient propagation through `wp.Mesh.points`, and other cases where arrays are passed to native functions
	- Add support for Python `@` operator as an alias for `wp.matmul()`
	- Add XPBD support for particle-particle collision
	- Add support for individual particle radii: `ModelBuilder.add_particle` has a new `radius` argument, `Model.particle_radius` is now a Warp array
	- Add per-particle flags as a `Model.particle_flags` Warp array, introduce `PARTICLE_FLAG_ACTIVE` to define whether a particle is being simulated and participates in contact dynamics
	- Add support for Python bitwise operators `&`, `\|`, `~`, `<<`, `>>`
	- Switch to using standalone LLVM compiler by default for `cpu` devices
	- Split `omni.warp` into `omni.warp.core` for Omniverse applications that want to use the Warp Python module with minimal additional dependencies
	- Disable kernel gradient generation by default inside Omniverse for improved compile times
	- Fix for bounds checking on element access of vector/matrix types
	- Fix for stream initialization when a custom (non-primary) external CUDA context has been set on the calling thread
	- Fix for duplicate `@wp.struct` registration during hot reload
	- Fix for array `unot()` operator so kernel writers can use `if not array:` syntax
	- Fix for case where dynamic loops are nested within unrolled loops
	- Change `wp.hash_grid_point_id()` now returns -1 if the `wp.HashGrid` has not been reserved before
	- Deprecate `wp.Model.soft_contact_distance` which is now replaced by `wp.Model.particle_radius`
	- Deprecate single scalar particle radius (should be a per-particle array)

	## [0.8.2] - 2023-04-21

	- Add `ModelBuilder.soft_contact_max` to control the maximum number of soft contacts that can be registered. Use `Model.allocate_soft_contacts(new_count)` to change count on existing `Model` objects.
	- Add support for `bool` parameters
	- Add support for logical boolean operators with `int` types
	- Fix for `wp.quat()` default constructor
	- Fix conditional reassignments
	- Add sign determination using angle weighted normal version of `wp.mesh_query_point()` as `wp.mesh_query_sign_normal()`
	- Add sign determination using winding number of `wp.mesh_query_point()` as `wp.mesh_query_sign_winding_number()`
	- Add query point without sign determination `wp.mesh_query_no_sign()`

	## [0.8.1] - 2023-04-13

	- Fix for regression when passing flattened numeric lists as matrix arguments to kernels
	- Fix for regressions when passing `wp.struct` types with uninitialized (`None`) member attributes

	## [0.8.0] - 2023-04-05

	- Add `Texture Write` node for updating dynamic RTX textures from Warp kernels / nodes
	- Add multi-dimensional kernel support to Warp Kernel Node
	- Add `wp.load_module()` to pre-load specific modules (pass `recursive=True` to load recursively)
	- Add `wp.poisson()` for sampling Poisson distributions
	- Add support for UsdPhysics schema see `warp.sim.parse_usd()`
	- Add XPBD rigid body implementation plus diff. simulation examples
	- Add support for standalone CPU compilation (no host-compiler) with LLVM backed, enable with `--standalone` build option
	- Add support for per-timer color in `wp.ScopedTimer()`
	- Add support for row-based construction of matrix types outside of kernels
	- Add support for setting and getting row vectors for Python matrices, see `matrix.get_row()`, `matrix.set_row()`
	- Add support for instantiating `wp.struct` types within kernels
	- Add support for indexed arrays, `slice = array[indices]` will now generate a sparse slice of array data
	- Add support for generic kernel params, use `def compute(param: Any):`
	- Add support for `with wp.ScopedDevice("cuda") as device:` syntax (same for `wp.ScopedStream()`, `wp.Tape()`)
	- Add support for creating custom length vector/matrices inside kernels, see `wp.vector()`, and `wp.matrix()`
	- Add support for creating identity matrices in kernels with, e.g.: `I = wp.identity(n=3, dtype=float)`
	- Add support for unary plus operator (`wp.pos()`)
	- Add support for `wp.constant` variables to be used directly in Python without having to use `.val` member
	- Add support for nested `wp.struct` types
	- Add support for returning `wp.struct` from functions
	- Add `--quick` build for faster local dev. iteration (uses a reduced set of SASS arches)
	- Add optional `requires_grad` parameter to `wp.from_torch()` to override gradient allocation
	- Add type hints for generic vector / matrix types in Python stubs
	- Add support for custom user function recording in `wp.Tape()`
	- Add support for registering CUTLASS `wp.matmul()` with tape backward pass
	- Add support for grids with > 2^31 threads (each dimension may be up to INT_MAX in length)
	- Add CPU fallback for `wp.matmul()`
	- Optimizations for `wp.launch()`, up to 3x faster launches in common cases
	- Fix `wp.randf()` conversion to float to reduce bias for uniform sampling
	- Fix capture of `wp.func` and `wp.constant` types from inside Python closures
	- Fix for CUDA on WSL
	- Fix for matrices in structs
	- Fix for transpose indexing for some non-square matrices
	- Enable Python faulthandler by default
	- Update to VS2019

	### Breaking Changes

	- `wp.constant` variables can now be treated as their true type, accessing the underlying value through `constant.val` is no longer supported
	- `wp.sim.model.ground_plane` is now a `wp.array` to support gradient, users should call `builder.set_ground_plane()` to create the ground
	- `wp.sim` capsule, cones, and cylinders are now aligned with the default USD up-axis

	## [0.7.2] - 2023-02-15

	- Reduce test time for vec/math types
	- Clean-up CUDA disabled build pipeline
	- Remove extension.gen.toml to make Kit packages Python version independent
	- Handle additional cases for array indexing inside Python

	## [0.7.1] - 2023-02-14

	- Disabling some slow tests for Kit
	- Make unit tests run on first GPU only by default

	## [0.7.0] - 2023-02-13

	- Add support for arbitrary length / type vector and matrices e.g.: `wp.vec(length=7, dtype=wp.float16)`, see `wp.vec()`, and `wp.mat()`
	- Add support for `array.flatten()`, `array.reshape()`, and `array.view()` with NumPy semantics
	- Add support for slicing `wp.array` types in Python
	- Add `wp.from_ptr()` helper to construct arrays from an existing allocation
	- Add support for `break` statements in ranged-for and while loops (backward pass support currently not implemented)
	- Add built-in mathematic constants, see `wp.pi`, `wp.e`, `wp.log2e`, etc.
	- Add built-in conversion between degrees and radians, see `wp.degrees()`, `wp.radians()`
	- Add security pop-up for Kernel Node
	- Improve error handling for kernel return values

	## [0.6.3] - 2023-01-31

	- Add DLPack utilities, see `wp.from_dlpack()`, `wp.to_dlpack()`
	- Add Jax utilities, see `wp.from_jax()`, `wp.to_jax()`, `wp.device_from_jax()`, `wp.device_to_jax()`
	- Fix for Linux Kit extensions OM-80132, OM-80133

	## [0.6.2] - 2023-01-19

	- Updated `wp.from_torch()` to support more data types
	- Updated `wp.from_torch()` to automatically determine the target Warp data type if not specified
	- Updated `wp.from_torch()` to support non-contiguous tensors with arbitrary strides
	- Add CUTLASS integration for dense GEMMs, see `wp.matmul()` and `wp.matmul_batched()`
	- Add QR and Eigen decompositions for `mat33` types, see `wp.qr3()`, and `wp.eig3()`
	- Add default (zero) constructors for matrix types
	- Add a flag to suppress all output except errors and warnings (set `wp.config.quiet = True`)
	- Skip recompilation when Kernel Node attributes are edited
	- Allow optional attributes for Kernel Node
	- Allow disabling backward pass code-gen on a per-kernel basis, use `@wp.kernel(enable_backward=False)`
	- Replace Python `imp` package with `importlib`
	- Fix for quaternion slerp gradients (`wp.quat_slerp()`)

	## [0.6.1] - 2022-12-05

	- Fix for non-CUDA builds
	- Fix strides computation in array_t constructor, fixes a bug with accessing mesh indices through mesh.indices[]
	- Disable backward pass code generation for kernel node (4-6x faster compilation)
	- Switch to linbuild for universal Linux binaries (affects TeamCity builds only)

	## [0.6.0] - 2022-11-28

	- Add support for CUDA streams, see `wp.Stream`, `wp.get_stream()`, `wp.set_stream()`, `wp.synchronize_stream()`, `wp.ScopedStream`
	- Add support for CUDA events, see `wp.Event`, `wp.record_event()`, `wp.wait_event()`, `wp.wait_stream()`, `wp.Stream.record_event()`, `wp.Stream.wait_event()`, `wp.Stream.wait_stream()`
	- Add support for PyTorch stream interop, see `wp.stream_from_torch()`, `wp.stream_to_torch()`
	- Add support for allocating host arrays in pinned memory for asynchronous data transfers, use `wp.array(..., pinned=True)` (default is non-pinned)
	- Add support for direct conversions between all scalar types, e.g.: `x = wp.uint8(wp.float64(3.0))`
	- Add per-module option to enable fast math, use `wp.set_module_options({"fast_math": True})`, fast math is now disabled by default
	- Add support for generating CUBIN kernels instead of PTX on systems with older drivers
	- Add user preference options for CUDA kernel output ("ptx" or "cubin", e.g.: `wp.config.cuda_output = "ptx"` or per-module `wp.set_module_options({"cuda_output": "ptx"})`)
	- Add kernel node for OmniGraph
	- Add `wp.quat_slerp()`, `wp.quat_to_axis_angle()`, `wp.rotate_rodriquez()` and adjoints for all remaining quaternion operations
	- Add support for unrolling for-loops when range is a `wp.constant`
	- Add support for arithmetic operators on built-in vector / matrix types outside of `wp.kernel`
	- Add support for multiple solution variables in `wp.optim` Adam optimization
	- Add nested attribute support for `wp.struct` attributes
	- Add missing adjoint implementations for spatial math types, and document all functions with missing adjoints
	- Add support for retrieving NanoVDB tiles and voxel size, see `wp.Volume.get_tiles()`, and `wp.Volume.get_voxel_size()`
	- Add support for store operations on integer NanoVDB volumes, see `wp.volume_store_i()`
	- Expose `wp.Mesh` points, indices, as arrays inside kernels, see `wp.mesh_get()`
	- Optimizations for `wp.array` construction, 2-3x faster on average
	- Optimizations for URDF import
	- Fix various deployment issues by statically linking with all CUDA libs
	- Update warp.so/warp.dll to CUDA Toolkit 11.5

	## [0.5.1] - 2022-11-01

	- Fix for unit tests in Kit

	## [0.5.0] - 2022-10-31

	- Add smoothed particle hydrodynamics (SPH) example, see `example_sph.py`
	- Add support for accessing `array.shape` inside kernels, e.g.: `width = arr.shape[0]`
	- Add dependency tracking to hot-reload modules if dependencies were modified
	- Add lazy acquisition of CUDA kernel contexts (save ~300Mb of GPU memory in MGPU environments)
	- Add BVH object, see `wp.Bvh` and `bvh_query_ray()`, `bvh_query_aabb()` functions
	- Add component index operations for `spatial_vector`, `spatial_matrix` types
	- Add `wp.lerp()` and `wp.smoothstep()` builtins
	- Add `wp.optim` module with implementation of the Adam optimizer for float and vector types
	- Add support for transient Python modules (fix for Houdini integration)
	- Add `wp.length_sq()`, `wp.trace()` for vector / matrix types respectively
	- Add missing adjoints for `wp.quat_rpy()`, `wp.determinant()`
	- Add `wp.atomic_min()`, `wp.atomic_max()` operators
	- Add vectorized version of `warp.sim.model.add_cloth_mesh()`
	- Add NVDB volume allocation API, see `wp.Volume.allocate()`, and `wp.Volume.allocate_by_tiles()`
	- Add NVDB volume write methods, see `wp.volume_store_i()`, `wp.volume_store_f()`, `wp.volume_store_v()`
	- Add MGPU documentation
	- Add example showing how to compute Jacobian of multiple environments in parallel, see `example_jacobian_ik.py`
	- Add `wp.Tape.zero()` support for `wp.struct` types
	- Make SampleBrowser an optional dependency for Kit extension
	- Make `wp.Mesh` object accept both 1d and 2d arrays of face vertex indices
	- Fix for reloading of class member kernel / function definitions using `importlib.reload()`
	- Fix for hashing of `wp.constants()` not invalidating kernels
	- Fix for reload when multiple `.ptx` versions are present
	- Improved error reporting during code-gen

	## [0.4.3] - 2022-09-20

	- Update all samples to use GPU interop path by default
	- Fix for arrays > 2GB in length
	- Add support for per-vertex USD mesh colors with warp.render class

	## [0.4.2] - 2022-09-07

	- Register Warp samples to the sample browser in Kit
	- Add NDEBUG flag to release mode kernel builds
	- Fix for particle solver node when using a large number of particles
	- Fix for broken cameras in Warp sample scenes

	## [0.4.1] - 2022-08-30

	- Add geometry sampling methods, see `wp.sample_unit_cube()`, `wp.sample_unit_disk()`, etc
	- Add `wp.lower_bound()` for searching sorted arrays
	- Add an option for disabling code-gen of backward pass to improve compilation times, see `wp.set_module_options({"enable_backward": False})`, True by default
	- Fix for using Warp from Script Editor or when module does not have a `__file__` attribute
	- Fix for hot reload of modules containing `wp.func()` definitions
	- Fix for debug flags not being set correctly on CUDA when `wp.config.mode == "debug"`, this enables bounds checking on CUDA kernels in debug mode
	- Fix for code gen of functions that do not return a value

	## [0.4.0] - 2022-08-09

	- Fix for FP16 conversions on GPUs without hardware support
	- Fix for `runtime = None` errors when reloading the Warp module
	- Fix for PTX architecture version when running with older drivers, see `wp.config.ptx_target_arch`
	- Fix for USD imports from `__init__.py`, defer them to individual functions that need them
	- Fix for robustness issues with sign determination for `wp.mesh_query_point()`
	- Fix for `wp.HashGrid` memory leak when creating/destroying grids
	- Add CUDA version checks for toolkit and driver
	- Add support for cross-module `@wp.struct` references
	- Support running even if CUDA initialization failed, use `wp.is_cuda_available()` to check availability
	- Statically linking with the CUDA runtime library to avoid deployment issues

	### Breaking Changes

	- Removed `wp.runtime` reference from the top-level module, as it should be considered private

	## [0.3.2] - 2022-07-19

	- Remove Torch import from `__init__.py`, defer import to `wp.from_torch()`, `wp.to_torch()`

	## [0.3.1] - 2022-07-12

	- Fix for marching cubes reallocation after initialization
	- Add support for closest point between line segment tests, see `wp.closest_point_edge_edge()` builtin
	- Add support for per-triangle elasticity coefficients in simulation, see `wp.sim.ModelBuilder.add_cloth_mesh()`
	- Add support for specifying default device, see `wp.set_device()`, `wp.get_device()`, `wp.ScopedDevice`
	- Add support for multiple GPUs (e.g., `"cuda:0"`, `"cuda:1"`), see `wp.get_cuda_devices()`, `wp.get_cuda_device_count()`, `wp.get_cuda_device()`
	- Add support for explicitly targeting the current CUDA context using device alias `"cuda"`
	- Add support for using arbitrary external CUDA contexts, see `wp.map_cuda_device()`, `wp.unmap_cuda_device()`
	- Add PyTorch device aliasing functions, see `wp.device_from_torch()`, `wp.device_to_torch()`

	### Breaking Changes

	- A CUDA device is used by default, if available (aligned with `wp.get_preferred_device()`)
	- `wp.ScopedCudaGuard` is deprecated, use `wp.ScopedDevice` instead
	- `wp.synchronize()` now synchronizes all devices; for finer-grained control, use `wp.synchronize_device()`
	- Device alias `"cuda"` now refers to the current CUDA context, rather than a specific device like `"cuda:0"` or `"cuda:1"`

	## [0.3.0] - 2022-07-08

	- Add support for FP16 storage type, see `wp.float16`
	- Add support for per-dimension byte strides, see `wp.array.strides`
	- Add support for passing Python classes as kernel arguments, see `@wp.struct` decorator
	- Add additional bounds checks for builtin matrix types
	- Add additional floating point checks, see `wp.config.verify_fp`
	- Add interleaved user source with generated code to aid debugging
	- Add generalized GPU marching cubes implementation, see `wp.MarchingCubes` class
	- Add additional scalar*matrix vector operators
	- Add support for retrieving a single row from builtin types, e.g.: `r = m33[i]`
	- Add `wp.log2()` and `wp.log10()` builtins
	- Add support for quickly instancing `wp.sim.ModelBuilder` objects to improve env. creation performance for RL
	- Remove custom CUB version and improve compatibility with CUDA 11.7
	- Fix to preserve external user-gradients when calling `wp.Tape.zero()`
	- Fix to only allocate gradient of a Torch tensor if `requires_grad=True`
	- Fix for missing `wp.mat22` constructor adjoint
	- Fix for ray-cast precision in edge case on GPU (watertightness issue)
	- Fix for kernel hot-reload when definition changes
	- Fix for NVCC warnings on Linux
	- Fix for generated function names when kernels are defined as class functions
	- Fix for reload of generated CPU kernel code on Linux
	- Fix for example scripts to output USD at 60 timecodes per-second (better Kit compatibility)

	## [0.2.3] - 2022-06-13

	- Fix for incorrect 4d array bounds checking
	- Fix for `wp.constant` changes not updating module hash
	- Fix for stale CUDA kernel cache when CPU kernels launched first
	- Array gradients are now allocated along with the arrays and accessible as `wp.array.grad`, users should take care to always call `wp.Tape.zero()` to clear gradients between different invocations of `wp.Tape.backward()`
	- Added `wp.array.fill_()` to set all entries to a scalar value (4-byte values only currently)

	### Breaking Changes

	- Tape `capture` option has been removed, users can now capture tapes inside existing CUDA graphs (e.g.: inside Torch)
	- Scalar loss arrays should now explicitly set `requires_grad=True` at creation time

	## [0.2.2] - 2022-05-30

	- Fix for `from import *` inside Warp initialization
	- Fix for body space velocity when using deforming Mesh objects with scale
	- Fix for noise gradient discontinuities affecting `wp.curlnoise()`
	- Fix for `wp.from_torch()` to correctly preserve shape
	- Fix for URDF parser incorrectly passing density to scale parameter
	- Optimizations for startup time from 3s -> 0.3s
	- Add support for custom kernel cache location, Warp will now store generated binaries in the user's application directory
	- Add support for cross-module function references, e.g.: call another modules @wp.func functions
	- Add support for overloading `@wp.func` functions based on argument type
	- Add support for calling built-in functions directly from Python interpreter outside kernels (experimental)
	- Add support for auto-complete and docstring lookup for builtins in IDEs like VSCode, PyCharm, etc
	- Add support for doing partial array copies, see `wp.copy()` for details
	- Add support for accessing mesh data directly in kernels, see `wp.mesh_get_point()`, `wp.mesh_get_index()`, `wp.mesh_eval_face_normal()`
	- Change to only compile for targets where kernel is launched (e.g.: will not compile CPU unless explicitly requested)

	### Breaking Changes

	- Builtin methods such as `wp.quat_identity()` now call the Warp native implementation directly and will return a `wp.quat` object instead of NumPy array
	- NumPy implementations of many builtin methods have been moved to `warp.utils` and will be deprecated
	- Local `@wp.func` functions should not be namespaced when called, e.g.: previously `wp.myfunc()` would work even if `myfunc()` was not a builtin
	- Removed `wp.rpy2quat()`, please use `wp.quat_rpy()` instead

	## [0.2.1] - 2022-05-11

	- Fix for unit tests in Kit

	## [0.2.0] - 2022-05-02

	### Warp Core

	- Fix for unrolling loops with negative bounds
	- Fix for unresolved symbol `hash_grid_build_device()` not found when lib is compiled without CUDA support
	- Fix for failure to load nvrtc-builtins64_113.dll when user has a newer CUDA toolkit installed on their machine
	- Fix for conversion of Torch tensors to wp.arrays() with a vector dtype (incorrect row count)
	- Fix for `warp.dll` not found on some Windows installations
	- Fix for macOS builds on Clang 13.x
	- Fix for step-through debugging of kernels on Linux
	- Add argument type checking for user defined `@wp.func` functions
	- Add support for custom iterable types, supports ranges, hash grid, and mesh query objects
	- Add support for multi-dimensional arrays, for example use `x = array[i,j,k]` syntax to address a 3-dimensional array
	- Add support for multi-dimensional kernel launches, use `launch(kernel, dim=(i,j,k), ...` and `i,j,k = wp.tid()` to obtain thread indices
	- Add support for bounds-checking array memory accesses in debug mode, use `wp.config.mode = "debug"` to enable
	- Add support for differentiating through dynamic and nested for-loops
	- Add support for evaluating MLP neural network layers inside kernels with custom activation functions, see `wp.mlp()`
	- Add additional NVDB sampling methods and adjoints, see `wp.volume_sample_i()`, `wp.volume_sample_f()`, and `wp.volume_sample_vec()`
	- Add support for loading zlib compressed NVDB volumes, see `wp.Volume.load_from_nvdb()`
	- Add support for triangle intersection testing, see `wp.intersect_tri_tri()`
	- Add support for NVTX profile zones in `wp.ScopedTimer()`
	- Add support for additional transform and quaternion math operations, see `wp.inverse()`, `wp.quat_to_matrix()`, `wp.quat_from_matrix()`
	- Add fast math (`--fast-math`) to kernel compilation by default
	- Add `warp.torch` import by default (if PyTorch is installed)

	### Warp Kit

	- Add Kit menu for browsing Warp documentation and example scenes under 'Window->Warp'
	- Fix for OgnParticleSolver.py example when collider is coming from Read Prim into Bundle node

	### Warp Sim

	- Fix for joint attachment forces
	- Fix for URDF importer and floating base support
	- Add examples showing how to use differentiable forward kinematics to solve inverse kinematics
	- Add examples for URDF cartpole and quadruped simulation

	### Breaking Changes

	- `wp.volume_sample_world()` is now replaced by `wp.volume_sample_f/i/vec()` which operate in index (local) space. Users should use `wp.volume_world_to_index()` to transform points from world space to index space before sampling.
	- `wp.mlp()` expects multi-dimensional arrays instead of one-dimensional arrays for inference, all other semantics remain the same as earlier versions of this API.
	- `wp.array.length` member has been removed, please use `wp.array.shape` to access array dimensions, or use `wp.array.size` to get total element count
	- Marking `dense_gemm()`, `dense_chol()`, etc methods as experimental until we revisit them

	## [0.1.25] - 2022-03-20

	- Add support for class methods to be Warp kernels
	- Add HashGrid reserve() so it can be used with CUDA graphs
	- Add support for CUDA graph capture of tape forward/backward passes
	- Add support for Python 3.8.x and 3.9.x
	- Add hyperbolic trigonometric functions, see wp.tanh(), wp.sinh(), wp.cosh()
	- Add support for floored division on integer types
	- Move tests into core library so they can be run in Kit environment

	## [0.1.24] - 2022-03-03

	### Warp Core

	- Add NanoVDB support, see wp.volume_sample*() methods
	- Add support for reading compile-time constants in kernels, see wp.constant()
	- Add support for __cuda_array_interface__ protocol for zero-copy interop with PyTorch, see wp.torch.to_torch()
	- Add support for additional numeric types, i8, u8, i16, u16, etc
	- Add better checks for device strings during allocation / launch
	- Add support for sampling random numbers with a normal distribution, see wp.randn()
	- Upgrade to CUDA 11.3
	- Update example scenes to Kit 103.1
	- Deduce array dtype from np.array when one is not provided
	- Fix for ranged for loops with negative step sizes
	- Fix for 3d and 4d spherical gradient distributions

	## [0.1.23] - 2022-02-17

	### Warp Core

	- Fix for generated code folder being removed during Showroom installation
	- Fix for macOS support
	- Fix for dynamic for-loop code gen edge case
	- Add procedural noise primitives, see noise(), pnoise(), curlnoise()
	- Move simulation helpers our of test into warp.sim module

	## [0.1.22] - 2022-02-14

	### Warp Core

	- Fix for .so reloading on Linux
	- Fix for while loop code-gen in some edge cases
	- Add rounding functions round(), rint(), trunc(), floor(), ceil()
	- Add support for printing strings and formatted strings from kernels
	- Add MSVC compiler version detection and require minimum

	### Warp Sim

	- Add support for universal and compound joint types

	## [0.1.21] - 2022-01-19

	### Warp Core

	- Fix for exception on shutdown in empty wp.array objects
	- Fix for hot reload of CPU kernels in Kit
	- Add hash grid primitive for point-based spatial queries, see hash_grid_query(), hash_grid_query_next()
	- Add new PRNG methods using PCG-based generators, see rand_init(), randf(), randi()
	- Add support for AABB mesh queries, see mesh_query_aabb(), mesh_query_aabb_next()
	- Add support for all Python range() loop variants
	- Add builtin vec2 type and additional math operators, pow(), tan(), atan(), atan2()
	- Remove dependency on CUDA driver library at build time
	- Remove unused NVRTC binary dependencies (50mb smaller Linux distribution)

	### Warp Sim

	- Bundle import of multiple shapes for simulation nodes
	- New OgnParticleVolume node for sampling shapes -> particles
	- New OgnParticleSolver node for DEM style granular materials

	## [0.1.20] - 2021-11-02

	- Updates to the ripple solver for GTC (support for multiple colliders, buoyancy, etc)

	## [0.1.19] - 2021-10-15

	- Publish from 2021.3 to avoid omni.graph database incompatibilities

	## [0.1.18] - 2021-10-08

	- Enable Linux support (tested on 20.04)

	## [0.1.17] - 2021-09-30

	- Fix for 3x3 SVD adjoint
	- Fix for A6000 GPU (bump compute model to sm_52 minimum)
	- Fix for .dll unload on rebuild
	- Fix for possible array destruction warnings on shutdown
	- Rename spatial_transform -> transform
	- Documentation update

	## [0.1.16] - 2021-09-06

	- Fix for case where simple assignments (a = b) incorrectly generated reference rather than value copy
	- Handle passing zero-length (empty) arrays to kernels

	## [0.1.15] - 2021-09-03

	- Add additional math library functions (asin, etc)
	- Add builtin 3x3 SVD support
	- Add support for named constants (True, False, None)
	- Add support for if/else statements (differentiable)
	- Add custom memset kernel to avoid CPU overhead of cudaMemset()
	- Add rigid body joint model to warp.sim (based on Brax)
	- Add Linux, MacOS support in core library
	- Fix for incorrectly treating pure assignment as reference instead of value copy
	- Removes the need to transfer array to CPU before numpy conversion (will be done implicitly)
	- Update the example OgnRipple wave equation solver to use bundles

	## [0.1.14] - 2021-08-09

	- Fix for out-of-bounds memory access in CUDA BVH
	- Better error checking after kernel launches (use warp.config.verify_cuda=True)
	- Fix for vec3 normalize adjoint code

	## [0.1.13] - 2021-07-29

	- Remove OgnShrinkWrap.py test node

	## [0.1.12] - 2021-07-29

	- Switch to Woop et al.'s watertight ray-tri intersection test
	- Disable --fast-math in CUDA compilation step for improved precision

	## [0.1.11] - 2021-07-28

	- Fix for mesh_query_ray() returning incorrect t-value

	## [0.1.10] - 2021-07-28

	- Fix for OV extension fwatcher filters to avoid hot-reload loop due to OGN regeneration

	## [0.1.9] - 2021-07-21

	- Fix for loading sibling DLL paths
	- Better type checking for built-in function arguments
	- Added runtime docs, can now list all builtins using wp.print_builtins()

	## [0.1.8] - 2021-07-14

	- Fix for hot-reload of CUDA kernels
	- Add Tape object for replaying differentiable kernels
	- Add helpers for Torch interop (convert torch.Tensor to wp.Array)

	## [0.1.7] - 2021-07-05

	- Switch to NVRTC for CUDA runtime
	- Allow running without host compiler
	- Disable asserts in kernel release mode (small perf. improvement)

	## [0.1.6] - 2021-06-14

	- Look for CUDA toolchain in target-deps

	## [0.1.5] - 2021-06-14

	- Rename OgLang -> Warp
	- Improve CUDA environment error checking
	- Clean-up some logging, add verbose mode (warp.config.verbose)

	## [0.1.4] - 2021-06-10

	- Add support for mesh raycast

	## [0.1.3] - 2021-06-09

	- Add support for unary negation operator
	- Add support for mutating variables during dynamic loops (non-differentiable)
	- Add support for in-place operators
	- Improve kernel cache start up times (avoids adjointing before cache check)
	- Update README.md with requirements / examples

	## [0.1.2] - 2021-06-03

	- Add support for querying mesh velocities
	- Add CUDA graph support, see warp.capture_begin(), warp.capture_end(), warp.capture_launch()
	- Add explicit initialization phase, warp.init()
	- Add variational Euler solver (sim)
	- Add contact caching, switch to nonlinear friction model (sim)

	- Fix for Linux/macOS support

	## [0.1.1] - 2021-05-18

	- Fix bug with conflicting CUDA contexts

	## [0.1.0] - 2021-05-17

	- Initial publish for alpha testing