Spaces:

qbhf2
/

GarmentCode

Sleeping

App Files Files Community

GarmentCode / NvidiaWarp-GarmentCode /CHANGELOG.md

qbhf2

added NvidiaWarp and GarmentCode repos

66c9c8a 11 months ago

preview code

raw

history blame contribute delete

40.3 kB

A newer version of the Gradio SDK is available: 6.6.0

Upgrade

CHANGELOG

[1.0.0-beta.6] - 2024-01-10

Do not create CPU copy of grad array when calling array.numpy()
Fix assert_np_equal() bug
Support Linux AArch64 platforms, including Jetson/Tegra devices
Add parallel testing runner (invoke with python -m warp.tests, use warp/tests/unittest_serial.py for serial testing)
Fix support for function calls in range()
matmul adjoints now accumulate
Expand available operators (e.g. vector @ matrix, scalar as dividend) and improve support for calling native built-ins
Fix multi-gpu synchronization issue in sparse.py
Add depth rendering to OpenGLRenderer, document warp.render
Make atomic_min, atomic_max differentiable
Fix error reporting using the exact source segment
Add user-friendly mesh query overloads, returning a struct instead of overwriting parameters
Address multiple differentiability issues
Fix backpropagation for returning array element references
Support passing the return value to adjoints
Add point basis space and explicit point-based quadrature for warp.fem
Support overriding the LLVM project source directory path using build_lib.py --build_llvm --llvm_source_path=
Fix the error message for accessing non-existing attributes
Flatten faces array for Mesh constructor in URDF parser

[1.0.0-beta.5] - 2023-11-22

Fix for kernel caching when function argument types change
Fix code-gen ordering of dependent structs
Fix for wp.Mesh build on MGPU systems
Fix for name clash bug with adjoint code: https://github.com/NVIDIA/warp/issues/154
Add wp.frac() for returning the fractional part of a floating point value
Add support for custom native CUDA snippets using @wp.func_native decorator
Add support for batched matmul with batch size > 2^16-1
Add support for transposed CUTLASS wp.matmul() and additional error checking
Add support for quad and hex meshes in wp.fem
Detect and warn when C++ runtime doesn't match compiler during build, e.g.: libstdc++.so.6: version `GLIBCXX_3.4.30' not found
Documentation update for wp.BVH
Documentation and simplified API for runtime kernel specialization wp.Kernel

[1.0.0-beta.4] - 2023-11-01

Add wp.cbrt() for cube root calculation
Add wp.mesh_furthest_point_no_sign() to compute furthest point on a surface from a query point
Add support for GPU BVH builds, 10-100x faster than CPU builds for large meshes
Add support for chained comparisons, i.e.: 0 < x < 2
Add support for running warp.fem examples headless
Fix for unit test determinism
Fix for possible GC collection of array during graph capture
Fix for wp.utils.array_sum() output initialization when used with vector types
Coverage and documentation updates

[1.0.0-beta.3] - 2023-10-19

Add support for code coverage scans (test_coverage.py), coverage at 85% in omni.warp.core
Add support for named component access for vector types, e.g.: a = v.x
Add support for lvalue expressions, e.g.: array[i] += b
Add casting constructors for matrix and vector types
Add support for type() operator that can be used to return type inside kernels
Add support for grid-stride kernels to support kernels with > 2^31-1 thread blocks
Fix for multi-process initialization warnings
Fix alignment issues with empty wp.struct
Fix for return statement warning with tuple-returning functions
Fix for wp.batched_matmul() registering the wrong function in the Tape
Fix and document for wp.sim forward + inverse kinematics
Fix for wp.func to return a default value if function does not return on all control paths
Refactor wp.fem support for new basis functions, decoupled function spaces
Optimizations for wp.noise functions, up to 10x faster in most cases
Optimizations for type_size_in_bytes() used in array construction'

Breaking Changes

To support grid-stride kernels, wp.tid() can no longer be called inside wp.func functions.

[1.0.0-beta.2] - 2023-09-01

Fix for passing bool into wp.func functions
Fix for deprecation warnings appearing on stderr, now redirected to stdout
Fix for using for i in wp.hash_grid_query(..) syntax

[1.0.0-beta.1] - 2023-08-29

Fix for wp.float16 being passed as kernel arguments
Fix for compile errors with kernels using structs in backward pass
Fix for wp.Mesh.refit() not being CUDA graph capturable due to synchronous temp. allocs
Fix for dynamic texture example flickering / MGPU crashes demo in Kit by reusing ui.DynamicImageProvider instances
Fix for a regression that disabled bundle change tracking in samples
Fix for incorrect surface velocities when meshes are deforming in OgnClothSimulate
Fix for incorrect lower-case when setting USD stage "up_axis" in examples
Fix for incompatible gradient types when wrapping PyTorch tensor as a vector or matrix type
Fix for adding open edges when building cloth constraints from meshes in wp.sim.ModelBuilder.add_cloth_mesh()
Add support for wp.fabricarray to directly access Fabric data from Warp kernels, see https://omniverse.gitlab-master-pages.nvidia.com/usdrt/docs/usdrt_prim_selection.html for examples
Add support for user defined gradient functions, see @wp.func_replay, and @wp.func_grad decorators
Add support for more OG attribute types in omni.warp.from_omni_graph()
Add support for creating NanoVDB wp.Volume objects from dense NumPy arrays
Add support for wp.volume_sample_grad_f() which returns the value + gradient efficiently from an NVDB volume
Add support for LLVM fp16 intrinsics for half-precision arithmetic
Add implementation of stochastic gradient descent, see wp.optim.SGD
Add warp.fem framework for solving weak-form PDE problems (see https://nvidia.github.io/warp/_build/html/modules/fem.html)
Optimizations for omni.warp extension load time (2.2s to 625ms cold start)
Make all omni.ui dependencies optional so that Warp unit tests can run headless
Deprecation of wp.tid() outside of kernel functions, users should pass tid() values to wp.func functions explicitly
Deprecation of wp.sim.Model.flatten() for returning all contained tensors from the model
Add support for clamping particle max velocity in wp.sim.Model.particle_max_velocity
Remove dependency on urdfpy package, improve MJCF parser handling of default values

[0.10.1] - 2023-07-25

Fix for large multidimensional kernel launches (> 2^32 threads)
Fix for module hashing with generics
Fix for unrolling loops with break or continue statements (will skip unrolling)
Fix for passing boolean arguments to build_lib.py (previously ignored)
Fix build warnings on Linux
Fix for creating array of structs from NumPy structured array
Fix for regression on kernel load times in Kit when using warp.sim
Update warp.array.reshape() to handle -1 dimensions
Update margin used by for mesh queries when using wp.sim.create_soft_body_contacts()
Improvements to gradient handling with warp.from_torch(), warp.to_torch() plus documentation

[0.10.0] - 2023-07-05

Add support for macOS universal binaries (x86 + aarch64) for M1+ support
Add additional methods for SDF generation please see the following new methods:
- wp.mesh_query_point_nosign() - closest point query with no sign determination
- wp.mesh_query_point_sign_normal() - closest point query with sign from angle-weighted normal
- wp.mesh_query_point_sign_winding_number() - closest point query with fast winding number sign determination
Add CSR/BSR sparse matrix support, see warp.sparse module:
- wp.sparse.BsrMatrix
- wp.sparse.bsr_zeros(), wp.sparse.bsr_set_from_triplets() for construction
- wp.sparse.bsr_mm(), wp.sparse_bsr_mv() for matrix-matrix and matrix-vector products respectively
Add array-wide utilities:
- wp.utils.array_scan() - prefix sum (inclusive or exclusive)
- wp.utils.array_sum() - sum across array
- wp.utils.radix_sort_pairs() - in-place radix sort (key,value) pairs
Add support for calling @wp.func functions from Python (outside of kernel scope)
Add support for recording kernel launches using a wp.Launch object that can be replayed with low overhead, use wp.launch(..., record_cmd=True) to generate a command object
Optimizations for wp.struct kernel arguments, up to 20x faster launches for kernels with large structs or number of params
Refresh USD samples to use bundle based workflow + change tracking
Add Python API for manipulating mesh and point bundle data in OmniGraph, see omni.warp.nodes module
- See omni.warp.nodes.mesh_create_bundle(), omni.warp.nodes.mesh_get_points(), etc.
Improvements to wp.array:
- Fix a number of array methods misbehaving with empty arrays
- Fix a number of bugs and memory leaks related to gradient arrays
- Fix array construction when creating arrays in pinned memory from a data source in pageable memory
- wp.empty() no longer zeroes-out memory and returns an uninitialized array, as intended
- array.zero_() and array.fill_() work with non-contiguous arrays
- Support wrapping non-contiguous NumPy arrays without a copy
- Support preserving the outer dimensions of NumPy arrays when wrapping them as Warp arrays of vector or matrix types
- Improve PyTorch and DLPack interop with Warp arrays of arbitrary vectors and matrices
- array.fill_() can now take lists or other sequences when filling arrays of vectors or matrices, e.g. arr.fill_([[1, 2], [3, 4]])
- array.fill_() now works with arrays of structs (pass a struct instance)
- wp.copy() gracefully handles copying between non-contiguous arrays on different devices
- Add wp.full() and wp.full_like(), e.g., a = wp.full(shape, value)
- Add optional device argument to wp.empty_like(), wp.zeros_like(), wp.full_like(), and wp.clone()
- Add indexedarray methods .zero_(), .fill_(), and .assign()
- Fix indexedarray methods .numpy() and .list()
- Fix array.list() to work with arrays of any Warp data type
- Fix array.list() synchronization issue with CUDA arrays
- array.numpy() called on an array of structs returns a structured NumPy array with named fields
- Improve the performance of creating arrays
Fix for Error: No module named 'omni.warp.core' when running some Kit configurations (e.g.: stubgen)
Fix for wp.struct instance address being included in module content hash
Fix codegen with overridden function names
Fix for kernel hashing so it occurs after code generation and before loading to fix a bug with stale kernel cache
Fix for wp.BVH.refit() when executed on the CPU
Fix adjoint of wp.struct constructor
Fix element accessors for wp.float16 vectors and matrices in Python
Fix wp.float16 members in structs
Remove deprecated wp.ScopedCudaGuard(), please use wp.ScopedDevice() instead

[0.9.0] - 2023-06-01

Add support for in-place modifications to vector, matrix, and struct types inside kernels (will warn during backward pass with wp.verbose if using gradients)
Add support for step-through VSCode debugging of kernel code with standalone LLVM compiler, see wp.breakpoint(), and walkthrough_debug.py
Add support for default values on built-in functions
Add support for multi-valued @wp.func functions
Add support for pass, continue, and break statements
Add missing __sincos_stret symbol for macOS
Add support for gradient propagation through wp.Mesh.points, and other cases where arrays are passed to native functions
Add support for Python @ operator as an alias for wp.matmul()
Add XPBD support for particle-particle collision
Add support for individual particle radii: ModelBuilder.add_particle has a new radius argument, Model.particle_radius is now a Warp array
Add per-particle flags as a Model.particle_flags Warp array, introduce PARTICLE_FLAG_ACTIVE to define whether a particle is being simulated and participates in contact dynamics
Add support for Python bitwise operators &, |, ~, <<, >>
Switch to using standalone LLVM compiler by default for cpu devices
Split omni.warp into omni.warp.core for Omniverse applications that want to use the Warp Python module with minimal additional dependencies
Disable kernel gradient generation by default inside Omniverse for improved compile times
Fix for bounds checking on element access of vector/matrix types
Fix for stream initialization when a custom (non-primary) external CUDA context has been set on the calling thread
Fix for duplicate @wp.struct registration during hot reload
Fix for array unot() operator so kernel writers can use if not array: syntax
Fix for case where dynamic loops are nested within unrolled loops
Change wp.hash_grid_point_id() now returns -1 if the wp.HashGrid has not been reserved before
Deprecate wp.Model.soft_contact_distance which is now replaced by wp.Model.particle_radius
Deprecate single scalar particle radius (should be a per-particle array)

[0.8.2] - 2023-04-21

Add ModelBuilder.soft_contact_max to control the maximum number of soft contacts that can be registered. Use Model.allocate_soft_contacts(new_count) to change count on existing Model objects.
Add support for bool parameters
Add support for logical boolean operators with int types
Fix for wp.quat() default constructor
Fix conditional reassignments
Add sign determination using angle weighted normal version of wp.mesh_query_point() as wp.mesh_query_sign_normal()
Add sign determination using winding number of wp.mesh_query_point() as wp.mesh_query_sign_winding_number()
Add query point without sign determination wp.mesh_query_no_sign()

[0.8.1] - 2023-04-13

Fix for regression when passing flattened numeric lists as matrix arguments to kernels
Fix for regressions when passing wp.struct types with uninitialized (None) member attributes

[0.8.0] - 2023-04-05

Add Texture Write node for updating dynamic RTX textures from Warp kernels / nodes
Add multi-dimensional kernel support to Warp Kernel Node
Add wp.load_module() to pre-load specific modules (pass recursive=True to load recursively)
Add wp.poisson() for sampling Poisson distributions
Add support for UsdPhysics schema see warp.sim.parse_usd()
Add XPBD rigid body implementation plus diff. simulation examples
Add support for standalone CPU compilation (no host-compiler) with LLVM backed, enable with --standalone build option
Add support for per-timer color in wp.ScopedTimer()
Add support for row-based construction of matrix types outside of kernels
Add support for setting and getting row vectors for Python matrices, see matrix.get_row(), matrix.set_row()
Add support for instantiating wp.struct types within kernels
Add support for indexed arrays, slice = array[indices] will now generate a sparse slice of array data
Add support for generic kernel params, use def compute(param: Any):
Add support for with wp.ScopedDevice("cuda") as device: syntax (same for wp.ScopedStream(), wp.Tape())
Add support for creating custom length vector/matrices inside kernels, see wp.vector(), and wp.matrix()
Add support for creating identity matrices in kernels with, e.g.: I = wp.identity(n=3, dtype=float)
Add support for unary plus operator (wp.pos())
Add support for wp.constant variables to be used directly in Python without having to use .val member
Add support for nested wp.struct types
Add support for returning wp.struct from functions
Add --quick build for faster local dev. iteration (uses a reduced set of SASS arches)
Add optional requires_grad parameter to wp.from_torch() to override gradient allocation
Add type hints for generic vector / matrix types in Python stubs
Add support for custom user function recording in wp.Tape()
Add support for registering CUTLASS wp.matmul() with tape backward pass
Add support for grids with > 2^31 threads (each dimension may be up to INT_MAX in length)
Add CPU fallback for wp.matmul()
Optimizations for wp.launch(), up to 3x faster launches in common cases
Fix wp.randf() conversion to float to reduce bias for uniform sampling
Fix capture of wp.func and wp.constant types from inside Python closures
Fix for CUDA on WSL
Fix for matrices in structs
Fix for transpose indexing for some non-square matrices
Enable Python faulthandler by default
Update to VS2019

Breaking Changes

wp.constant variables can now be treated as their true type, accessing the underlying value through constant.val is no longer supported
wp.sim.model.ground_plane is now a wp.array to support gradient, users should call builder.set_ground_plane() to create the ground
wp.sim capsule, cones, and cylinders are now aligned with the default USD up-axis

[0.7.2] - 2023-02-15

Reduce test time for vec/math types
Clean-up CUDA disabled build pipeline
Remove extension.gen.toml to make Kit packages Python version independent
Handle additional cases for array indexing inside Python

[0.7.1] - 2023-02-14

Disabling some slow tests for Kit
Make unit tests run on first GPU only by default

[0.7.0] - 2023-02-13

Add support for arbitrary length / type vector and matrices e.g.: wp.vec(length=7, dtype=wp.float16), see wp.vec(), and wp.mat()
Add support for array.flatten(), array.reshape(), and array.view() with NumPy semantics
Add support for slicing wp.array types in Python
Add wp.from_ptr() helper to construct arrays from an existing allocation
Add support for break statements in ranged-for and while loops (backward pass support currently not implemented)
Add built-in mathematic constants, see wp.pi, wp.e, wp.log2e, etc.
Add built-in conversion between degrees and radians, see wp.degrees(), wp.radians()
Add security pop-up for Kernel Node
Improve error handling for kernel return values

[0.6.3] - 2023-01-31

Add DLPack utilities, see wp.from_dlpack(), wp.to_dlpack()
Add Jax utilities, see wp.from_jax(), wp.to_jax(), wp.device_from_jax(), wp.device_to_jax()
Fix for Linux Kit extensions OM-80132, OM-80133

[0.6.2] - 2023-01-19

Updated wp.from_torch() to support more data types
Updated wp.from_torch() to automatically determine the target Warp data type if not specified
Updated wp.from_torch() to support non-contiguous tensors with arbitrary strides
Add CUTLASS integration for dense GEMMs, see wp.matmul() and wp.matmul_batched()
Add QR and Eigen decompositions for mat33 types, see wp.qr3(), and wp.eig3()
Add default (zero) constructors for matrix types
Add a flag to suppress all output except errors and warnings (set wp.config.quiet = True)
Skip recompilation when Kernel Node attributes are edited
Allow optional attributes for Kernel Node
Allow disabling backward pass code-gen on a per-kernel basis, use @wp.kernel(enable_backward=False)
Replace Python imp package with importlib
Fix for quaternion slerp gradients (wp.quat_slerp())

[0.6.1] - 2022-12-05

Fix for non-CUDA builds
Fix strides computation in array_t constructor, fixes a bug with accessing mesh indices through mesh.indices[]
Disable backward pass code generation for kernel node (4-6x faster compilation)
Switch to linbuild for universal Linux binaries (affects TeamCity builds only)

[0.6.0] - 2022-11-28

Add support for CUDA streams, see wp.Stream, wp.get_stream(), wp.set_stream(), wp.synchronize_stream(), wp.ScopedStream
Add support for CUDA events, see wp.Event, wp.record_event(), wp.wait_event(), wp.wait_stream(), wp.Stream.record_event(), wp.Stream.wait_event(), wp.Stream.wait_stream()
Add support for PyTorch stream interop, see wp.stream_from_torch(), wp.stream_to_torch()
Add support for allocating host arrays in pinned memory for asynchronous data transfers, use wp.array(..., pinned=True) (default is non-pinned)
Add support for direct conversions between all scalar types, e.g.: x = wp.uint8(wp.float64(3.0))
Add per-module option to enable fast math, use wp.set_module_options({"fast_math": True}), fast math is now disabled by default
Add support for generating CUBIN kernels instead of PTX on systems with older drivers
Add user preference options for CUDA kernel output ("ptx" or "cubin", e.g.: wp.config.cuda_output = "ptx" or per-module wp.set_module_options({"cuda_output": "ptx"}))
Add kernel node for OmniGraph
Add wp.quat_slerp(), wp.quat_to_axis_angle(), wp.rotate_rodriquez() and adjoints for all remaining quaternion operations
Add support for unrolling for-loops when range is a wp.constant
Add support for arithmetic operators on built-in vector / matrix types outside of wp.kernel
Add support for multiple solution variables in wp.optim Adam optimization
Add nested attribute support for wp.struct attributes
Add missing adjoint implementations for spatial math types, and document all functions with missing adjoints
Add support for retrieving NanoVDB tiles and voxel size, see wp.Volume.get_tiles(), and wp.Volume.get_voxel_size()
Add support for store operations on integer NanoVDB volumes, see wp.volume_store_i()
Expose wp.Mesh points, indices, as arrays inside kernels, see wp.mesh_get()
Optimizations for wp.array construction, 2-3x faster on average
Optimizations for URDF import
Fix various deployment issues by statically linking with all CUDA libs
Update warp.so/warp.dll to CUDA Toolkit 11.5

[0.5.1] - 2022-11-01

Fix for unit tests in Kit

[0.5.0] - 2022-10-31

Add smoothed particle hydrodynamics (SPH) example, see example_sph.py
Add support for accessing array.shape inside kernels, e.g.: width = arr.shape[0]
Add dependency tracking to hot-reload modules if dependencies were modified
Add lazy acquisition of CUDA kernel contexts (save ~300Mb of GPU memory in MGPU environments)
Add BVH object, see wp.Bvh and bvh_query_ray(), bvh_query_aabb() functions
Add component index operations for spatial_vector, spatial_matrix types
Add wp.lerp() and wp.smoothstep() builtins
Add wp.optim module with implementation of the Adam optimizer for float and vector types
Add support for transient Python modules (fix for Houdini integration)
Add wp.length_sq(), wp.trace() for vector / matrix types respectively
Add missing adjoints for wp.quat_rpy(), wp.determinant()
Add wp.atomic_min(), wp.atomic_max() operators
Add vectorized version of warp.sim.model.add_cloth_mesh()
Add NVDB volume allocation API, see wp.Volume.allocate(), and wp.Volume.allocate_by_tiles()
Add NVDB volume write methods, see wp.volume_store_i(), wp.volume_store_f(), wp.volume_store_v()
Add MGPU documentation
Add example showing how to compute Jacobian of multiple environments in parallel, see example_jacobian_ik.py
Add wp.Tape.zero() support for wp.struct types
Make SampleBrowser an optional dependency for Kit extension
Make wp.Mesh object accept both 1d and 2d arrays of face vertex indices
Fix for reloading of class member kernel / function definitions using importlib.reload()
Fix for hashing of wp.constants() not invalidating kernels
Fix for reload when multiple .ptx versions are present
Improved error reporting during code-gen

[0.4.3] - 2022-09-20

Update all samples to use GPU interop path by default
Fix for arrays > 2GB in length
Add support for per-vertex USD mesh colors with warp.render class

[0.4.2] - 2022-09-07

Register Warp samples to the sample browser in Kit
Add NDEBUG flag to release mode kernel builds
Fix for particle solver node when using a large number of particles
Fix for broken cameras in Warp sample scenes

[0.4.1] - 2022-08-30

Add geometry sampling methods, see wp.sample_unit_cube(), wp.sample_unit_disk(), etc
Add wp.lower_bound() for searching sorted arrays
Add an option for disabling code-gen of backward pass to improve compilation times, see wp.set_module_options({"enable_backward": False}), True by default
Fix for using Warp from Script Editor or when module does not have a __file__ attribute
Fix for hot reload of modules containing wp.func() definitions
Fix for debug flags not being set correctly on CUDA when wp.config.mode == "debug", this enables bounds checking on CUDA kernels in debug mode
Fix for code gen of functions that do not return a value

[0.4.0] - 2022-08-09

Fix for FP16 conversions on GPUs without hardware support
Fix for runtime = None errors when reloading the Warp module
Fix for PTX architecture version when running with older drivers, see wp.config.ptx_target_arch
Fix for USD imports from __init__.py, defer them to individual functions that need them
Fix for robustness issues with sign determination for wp.mesh_query_point()
Fix for wp.HashGrid memory leak when creating/destroying grids
Add CUDA version checks for toolkit and driver
Add support for cross-module @wp.struct references
Support running even if CUDA initialization failed, use wp.is_cuda_available() to check availability
Statically linking with the CUDA runtime library to avoid deployment issues

Breaking Changes

Removed wp.runtime reference from the top-level module, as it should be considered private

[0.3.2] - 2022-07-19

Remove Torch import from __init__.py, defer import to wp.from_torch(), wp.to_torch()

[0.3.1] - 2022-07-12

Fix for marching cubes reallocation after initialization
Add support for closest point between line segment tests, see wp.closest_point_edge_edge() builtin
Add support for per-triangle elasticity coefficients in simulation, see wp.sim.ModelBuilder.add_cloth_mesh()
Add support for specifying default device, see wp.set_device(), wp.get_device(), wp.ScopedDevice
Add support for multiple GPUs (e.g., "cuda:0", "cuda:1"), see wp.get_cuda_devices(), wp.get_cuda_device_count(), wp.get_cuda_device()
Add support for explicitly targeting the current CUDA context using device alias "cuda"
Add support for using arbitrary external CUDA contexts, see wp.map_cuda_device(), wp.unmap_cuda_device()
Add PyTorch device aliasing functions, see wp.device_from_torch(), wp.device_to_torch()

Breaking Changes

A CUDA device is used by default, if available (aligned with wp.get_preferred_device())
wp.ScopedCudaGuard is deprecated, use wp.ScopedDevice instead
wp.synchronize() now synchronizes all devices; for finer-grained control, use wp.synchronize_device()
Device alias "cuda" now refers to the current CUDA context, rather than a specific device like "cuda:0" or "cuda:1"

[0.3.0] - 2022-07-08

Add support for FP16 storage type, see wp.float16
Add support for per-dimension byte strides, see wp.array.strides
Add support for passing Python classes as kernel arguments, see @wp.struct decorator
Add additional bounds checks for builtin matrix types
Add additional floating point checks, see wp.config.verify_fp
Add interleaved user source with generated code to aid debugging
Add generalized GPU marching cubes implementation, see wp.MarchingCubes class
Add additional scalar*matrix vector operators
Add support for retrieving a single row from builtin types, e.g.: r = m33[i]
Add wp.log2() and wp.log10() builtins
Add support for quickly instancing wp.sim.ModelBuilder objects to improve env. creation performance for RL
Remove custom CUB version and improve compatibility with CUDA 11.7
Fix to preserve external user-gradients when calling wp.Tape.zero()
Fix to only allocate gradient of a Torch tensor if requires_grad=True
Fix for missing wp.mat22 constructor adjoint
Fix for ray-cast precision in edge case on GPU (watertightness issue)
Fix for kernel hot-reload when definition changes
Fix for NVCC warnings on Linux
Fix for generated function names when kernels are defined as class functions
Fix for reload of generated CPU kernel code on Linux
Fix for example scripts to output USD at 60 timecodes per-second (better Kit compatibility)

[0.2.3] - 2022-06-13

Fix for incorrect 4d array bounds checking
Fix for wp.constant changes not updating module hash
Fix for stale CUDA kernel cache when CPU kernels launched first
Array gradients are now allocated along with the arrays and accessible as wp.array.grad, users should take care to always call wp.Tape.zero() to clear gradients between different invocations of wp.Tape.backward()
Added wp.array.fill_() to set all entries to a scalar value (4-byte values only currently)

Breaking Changes

Tape capture option has been removed, users can now capture tapes inside existing CUDA graphs (e.g.: inside Torch)
Scalar loss arrays should now explicitly set requires_grad=True at creation time

[0.2.2] - 2022-05-30

Fix for from import * inside Warp initialization
Fix for body space velocity when using deforming Mesh objects with scale
Fix for noise gradient discontinuities affecting wp.curlnoise()
Fix for wp.from_torch() to correctly preserve shape
Fix for URDF parser incorrectly passing density to scale parameter
Optimizations for startup time from 3s -> 0.3s
Add support for custom kernel cache location, Warp will now store generated binaries in the user's application directory
Add support for cross-module function references, e.g.: call another modules @wp.func functions
Add support for overloading @wp.func functions based on argument type
Add support for calling built-in functions directly from Python interpreter outside kernels (experimental)
Add support for auto-complete and docstring lookup for builtins in IDEs like VSCode, PyCharm, etc
Add support for doing partial array copies, see wp.copy() for details
Add support for accessing mesh data directly in kernels, see wp.mesh_get_point(), wp.mesh_get_index(), wp.mesh_eval_face_normal()
Change to only compile for targets where kernel is launched (e.g.: will not compile CPU unless explicitly requested)

Breaking Changes

Builtin methods such as wp.quat_identity() now call the Warp native implementation directly and will return a wp.quat object instead of NumPy array
NumPy implementations of many builtin methods have been moved to warp.utils and will be deprecated
Local @wp.func functions should not be namespaced when called, e.g.: previously wp.myfunc() would work even if myfunc() was not a builtin
Removed wp.rpy2quat(), please use wp.quat_rpy() instead

[0.2.1] - 2022-05-11

Fix for unit tests in Kit

[0.2.0] - 2022-05-02

Warp Core

Fix for unrolling loops with negative bounds
Fix for unresolved symbol hash_grid_build_device() not found when lib is compiled without CUDA support
Fix for failure to load nvrtc-builtins64_113.dll when user has a newer CUDA toolkit installed on their machine
Fix for conversion of Torch tensors to wp.arrays() with a vector dtype (incorrect row count)
Fix for warp.dll not found on some Windows installations
Fix for macOS builds on Clang 13.x
Fix for step-through debugging of kernels on Linux
Add argument type checking for user defined @wp.func functions
Add support for custom iterable types, supports ranges, hash grid, and mesh query objects
Add support for multi-dimensional arrays, for example use x = array[i,j,k] syntax to address a 3-dimensional array
Add support for multi-dimensional kernel launches, use launch(kernel, dim=(i,j,k), ... and i,j,k = wp.tid() to obtain thread indices
Add support for bounds-checking array memory accesses in debug mode, use wp.config.mode = "debug" to enable
Add support for differentiating through dynamic and nested for-loops
Add support for evaluating MLP neural network layers inside kernels with custom activation functions, see wp.mlp()
Add additional NVDB sampling methods and adjoints, see wp.volume_sample_i(), wp.volume_sample_f(), and wp.volume_sample_vec()
Add support for loading zlib compressed NVDB volumes, see wp.Volume.load_from_nvdb()
Add support for triangle intersection testing, see wp.intersect_tri_tri()
Add support for NVTX profile zones in wp.ScopedTimer()
Add support for additional transform and quaternion math operations, see wp.inverse(), wp.quat_to_matrix(), wp.quat_from_matrix()
Add fast math (--fast-math) to kernel compilation by default
Add warp.torch import by default (if PyTorch is installed)

Warp Kit

Add Kit menu for browsing Warp documentation and example scenes under 'Window->Warp'
Fix for OgnParticleSolver.py example when collider is coming from Read Prim into Bundle node

Warp Sim

Fix for joint attachment forces
Fix for URDF importer and floating base support
Add examples showing how to use differentiable forward kinematics to solve inverse kinematics
Add examples for URDF cartpole and quadruped simulation

Breaking Changes

wp.volume_sample_world() is now replaced by wp.volume_sample_f/i/vec() which operate in index (local) space. Users should use wp.volume_world_to_index() to transform points from world space to index space before sampling.
wp.mlp() expects multi-dimensional arrays instead of one-dimensional arrays for inference, all other semantics remain the same as earlier versions of this API.
wp.array.length member has been removed, please use wp.array.shape to access array dimensions, or use wp.array.size to get total element count
Marking dense_gemm(), dense_chol(), etc methods as experimental until we revisit them

[0.1.25] - 2022-03-20

Add support for class methods to be Warp kernels
Add HashGrid reserve() so it can be used with CUDA graphs
Add support for CUDA graph capture of tape forward/backward passes
Add support for Python 3.8.x and 3.9.x
Add hyperbolic trigonometric functions, see wp.tanh(), wp.sinh(), wp.cosh()
Add support for floored division on integer types
Move tests into core library so they can be run in Kit environment

[0.1.24] - 2022-03-03

Warp Core

Add NanoVDB support, see wp.volume_sample*() methods
Add support for reading compile-time constants in kernels, see wp.constant()
Add support for cuda_array_interface protocol for zero-copy interop with PyTorch, see wp.torch.to_torch()
Add support for additional numeric types, i8, u8, i16, u16, etc
Add better checks for device strings during allocation / launch
Add support for sampling random numbers with a normal distribution, see wp.randn()
Upgrade to CUDA 11.3
Update example scenes to Kit 103.1
Deduce array dtype from np.array when one is not provided
Fix for ranged for loops with negative step sizes
Fix for 3d and 4d spherical gradient distributions

[0.1.23] - 2022-02-17

Warp Core

Fix for generated code folder being removed during Showroom installation
Fix for macOS support
Fix for dynamic for-loop code gen edge case
Add procedural noise primitives, see noise(), pnoise(), curlnoise()
Move simulation helpers our of test into warp.sim module

[0.1.22] - 2022-02-14

Warp Core

Fix for .so reloading on Linux
Fix for while loop code-gen in some edge cases
Add rounding functions round(), rint(), trunc(), floor(), ceil()
Add support for printing strings and formatted strings from kernels
Add MSVC compiler version detection and require minimum

Warp Sim

Add support for universal and compound joint types

[0.1.21] - 2022-01-19

Warp Core

Fix for exception on shutdown in empty wp.array objects
Fix for hot reload of CPU kernels in Kit
Add hash grid primitive for point-based spatial queries, see hash_grid_query(), hash_grid_query_next()
Add new PRNG methods using PCG-based generators, see rand_init(), randf(), randi()
Add support for AABB mesh queries, see mesh_query_aabb(), mesh_query_aabb_next()
Add support for all Python range() loop variants
Add builtin vec2 type and additional math operators, pow(), tan(), atan(), atan2()
Remove dependency on CUDA driver library at build time
Remove unused NVRTC binary dependencies (50mb smaller Linux distribution)

Warp Sim

Bundle import of multiple shapes for simulation nodes
New OgnParticleVolume node for sampling shapes -> particles
New OgnParticleSolver node for DEM style granular materials

[0.1.20] - 2021-11-02

Updates to the ripple solver for GTC (support for multiple colliders, buoyancy, etc)

[0.1.19] - 2021-10-15

Publish from 2021.3 to avoid omni.graph database incompatibilities

[0.1.18] - 2021-10-08

Enable Linux support (tested on 20.04)

[0.1.17] - 2021-09-30

Fix for 3x3 SVD adjoint
Fix for A6000 GPU (bump compute model to sm_52 minimum)
Fix for .dll unload on rebuild
Fix for possible array destruction warnings on shutdown
Rename spatial_transform -> transform
Documentation update

[0.1.16] - 2021-09-06

Fix for case where simple assignments (a = b) incorrectly generated reference rather than value copy
Handle passing zero-length (empty) arrays to kernels

[0.1.15] - 2021-09-03

Add additional math library functions (asin, etc)
Add builtin 3x3 SVD support
Add support for named constants (True, False, None)
Add support for if/else statements (differentiable)
Add custom memset kernel to avoid CPU overhead of cudaMemset()
Add rigid body joint model to warp.sim (based on Brax)
Add Linux, MacOS support in core library
Fix for incorrectly treating pure assignment as reference instead of value copy
Removes the need to transfer array to CPU before numpy conversion (will be done implicitly)
Update the example OgnRipple wave equation solver to use bundles

[0.1.14] - 2021-08-09

Fix for out-of-bounds memory access in CUDA BVH
Better error checking after kernel launches (use warp.config.verify_cuda=True)
Fix for vec3 normalize adjoint code

[0.1.13] - 2021-07-29

Remove OgnShrinkWrap.py test node

[0.1.12] - 2021-07-29

Switch to Woop et al.'s watertight ray-tri intersection test
Disable --fast-math in CUDA compilation step for improved precision

[0.1.11] - 2021-07-28

Fix for mesh_query_ray() returning incorrect t-value

[0.1.10] - 2021-07-28

Fix for OV extension fwatcher filters to avoid hot-reload loop due to OGN regeneration

[0.1.9] - 2021-07-21

Fix for loading sibling DLL paths
Better type checking for built-in function arguments
Added runtime docs, can now list all builtins using wp.print_builtins()

[0.1.8] - 2021-07-14

Fix for hot-reload of CUDA kernels
Add Tape object for replaying differentiable kernels
Add helpers for Torch interop (convert torch.Tensor to wp.Array)

[0.1.7] - 2021-07-05

Switch to NVRTC for CUDA runtime
Allow running without host compiler
Disable asserts in kernel release mode (small perf. improvement)

[0.1.6] - 2021-06-14

Look for CUDA toolchain in target-deps

[0.1.5] - 2021-06-14

Rename OgLang -> Warp
Improve CUDA environment error checking
Clean-up some logging, add verbose mode (warp.config.verbose)

[0.1.4] - 2021-06-10

Add support for mesh raycast

[0.1.3] - 2021-06-09

Add support for unary negation operator
Add support for mutating variables during dynamic loops (non-differentiable)
Add support for in-place operators
Improve kernel cache start up times (avoids adjointing before cache check)
Update README.md with requirements / examples

[0.1.2] - 2021-06-03

Add support for querying mesh velocities
Add CUDA graph support, see warp.capture_begin(), warp.capture_end(), warp.capture_launch()
Add explicit initialization phase, warp.init()
Add variational Euler solver (sim)
Add contact caching, switch to nonlinear friction model (sim)
Fix for Linux/macOS support

[0.1.1] - 2021-05-18

Fix bug with conflicting CUDA contexts

[0.1.0] - 2021-05-17

Initial publish for alpha testing