Spaces:
Runtime error
Runtime error
| <hr> | |
| <h3>About CUB</h3> | |
| CUB provides state-of-the-art, reusable software components for every layer | |
| of the CUDA programming model: | |
| - [<b><em>Device-wide primitives</em></b>](https://nvlabs.github.com/cub/group___device_module.html) | |
| - Sort, prefix scan, reduction, histogram, etc. | |
| - Compatible with CUDA dynamic parallelism | |
| - [<b><em>Block-wide "collective" primitives</em></b>](https://nvlabs.github.com/cub/group___block_module.html) | |
| - I/O, sort, prefix scan, reduction, histogram, etc. | |
| - Compatible with arbitrary thread block sizes and types | |
| - [<b><em>Warp-wide "collective" primitives</em></b>](https://nvlabs.github.com/cub/group___warp_module.html) | |
| - Warp-wide prefix scan, reduction, etc. | |
| - Safe and architecture-specific | |
| - [<b><em>Thread and resource utilities</em></b>](https://nvlabs.github.com/cub/group___thread_module.html) | |
| - PTX intrinsics, device reflection, texture-caching iterators, caching memory allocators, etc. | |
|  | |
| CUB is included in the NVIDIA HPC SDK and the CUDA Toolkit. | |
| We recommend the [CUB Project Website](http://nvlabs.github.com/cub) for further information and examples. | |
| <br><hr> | |
| <h3>A Simple Example</h3> | |
| ```C++ | |
| #include <cub/cub.cuh> | |
| // Block-sorting CUDA kernel | |
| __global__ void BlockSortKernel(int *d_in, int *d_out) | |
| { | |
| using namespace cub; | |
| // Specialize BlockRadixSort, BlockLoad, and BlockStore for 128 threads | |
| // owning 16 integer items each | |
| typedef BlockRadixSort<int, 128, 16> BlockRadixSort; | |
| typedef BlockLoad<int, 128, 16, BLOCK_LOAD_TRANSPOSE> BlockLoad; | |
| typedef BlockStore<int, 128, 16, BLOCK_STORE_TRANSPOSE> BlockStore; | |
| // Allocate shared memory | |
| __shared__ union { | |
| typename BlockRadixSort::TempStorage sort; | |
| typename BlockLoad::TempStorage load; | |
| typename BlockStore::TempStorage store; | |
| } temp_storage; | |
| int block_offset = blockIdx.x * (128 * 16); // OffsetT for this block's ment | |
| // Obtain a segment of 2048 consecutive keys that are blocked across threads | |
| int thread_keys[16]; | |
| BlockLoad(temp_storage.load).Load(d_in + block_offset, thread_keys); | |
| __syncthreads(); | |
| // Collectively sort the keys | |
| BlockRadixSort(temp_storage.sort).Sort(thread_keys); | |
| __syncthreads(); | |
| // Store the sorted segment | |
| BlockStore(temp_storage.store).Store(d_out + block_offset, thread_keys); | |
| } | |
| ``` | |
| Each thread block uses `cub::BlockRadixSort` to collectively sort | |
| its own input segment. The class is specialized by the | |
| data type being sorted, by the number of threads per block, by the number of | |
| keys per thread, and implicitly by the targeted compilation architecture. | |
| The `cub::BlockLoad` and `cub::BlockStore` classes are similarly specialized. | |
| Furthermore, to provide coalesced accesses to device memory, these primitives are | |
| configured to access memory using a striped access pattern (where consecutive threads | |
| simultaneously access consecutive items) and then <em>transpose</em> the keys into | |
| a [<em>blocked arrangement</em>](index.html#sec4sec3) of elements across threads. | |
| Once specialized, these classes expose opaque `TempStorage` member types. | |
| The thread block uses these storage types to statically allocate the union of | |
| shared memory needed by the thread block. (Alternatively these storage types | |
| could be aliased to global memory allocations). | |
| <br><hr> | |
| <h3>Releases</h3> | |
| CUB is distributed with the NVIDIA HPC SDK and the CUDA Toolkit in addition | |
| to GitHub. | |
| See the [changelog](CHANGELOG.md) for details about specific releases. | |
| | CUB Release | Included In | | |
| | ------------------------- | --------------------------------------- | | |
| | 1.9.10-1 | NVIDIA HPC SDK 20.7 & CUDA Toolkit 11.1 | | |
| | 1.9.10 | NVIDIA HPC SDK 20.5 | | |
| | 1.9.9 | CUDA Toolkit 11.0 | | |
| | 1.9.8-1 | NVIDIA HPC SDK 20.3 | | |
| | 1.9.8 | CUDA Toolkit 11.0 Early Access | | |
| | 1.9.8 | CUDA 11.0 Early Access | | |
| | 1.8.0 | | | |
| | 1.7.5 | Thrust 1.9.2 | | |
| | 1.7.4 | Thrust 1.9.1-2 | | |
| | 1.7.3 | | | |
| | 1.7.2 | | | |
| | 1.7.1 | | | |
| | 1.7.0 | Thrust 1.9.0-5 | | |
| | 1.6.4 | | | |
| | 1.6.3 | | | |
| | 1.6.2 (previously 1.5.5) | | | |
| | 1.6.1 (previously 1.5.4) | | | |
| | 1.6.0 (previously 1.5.3) | | | |
| | 1.5.2 | | | |
| | 1.5.1 | | | |
| | 1.5.0 | | | |
| | 1.4.1 | | | |
| | 1.4.0 | | | |
| | 1.3.2 | | | |
| | 1.3.1 | | | |
| | 1.3.0 | | | |
| | 1.2.3 | | | |
| | 1.2.2 | | | |
| | 1.2.0 | | | |
| | 1.1.1 | | | |
| | 1.0.2 | | | |
| | 1.0.1 | | | |
| | 0.9.4 | | | |
| | 0.9.2 | | | |
| | 0.9.1 | | | |
| | 0.9.0 | | | |
| <br><hr> | |
| <h3>Development Process</h3> | |
| CUB uses the [CMake build system](https://cmake.org/) to build unit tests, | |
| examples, and header tests. To build CUB as a developer, the following | |
| recipe should be followed: | |
| ``` | |
| # Clone CUB repo from github: | |
| git clone https://github.com/thrust/cub.git | |
| cd cub | |
| # Create build directory: | |
| mkdir build | |
| cd build | |
| # Configure -- use one of the following: | |
| cmake .. # Command line interface. | |
| ccmake .. # ncurses GUI (Linux only) | |
| cmake-gui # Graphical UI, set source/build directories in the app | |
| # Build: | |
| cmake --build . -j <num jobs> # invokes make (or ninja, etc) | |
| # Run tests and examples: | |
| ctest | |
| ``` | |
| By default, the C++14 standard is targeted, but this can be changed in CMake. | |
| More information on configuring your CUB build and creating a pull request is | |
| found in [CONTRIBUTING.md](CONTRIBUTING.md). | |
| <br><hr> | |
| <h3>Open Source License</h3> | |
| CUB is available under the "New BSD" open-source license: | |
| ``` | |
| Copyright (c) 2010-2011, Duane Merrill. All rights reserved. | |
| Copyright (c) 2011-2018, NVIDIA CORPORATION. All rights reserved. | |
| Redistribution and use in source and binary forms, with or without | |
| modification, are permitted provided that the following conditions are met: | |
| * Redistributions of source code must retain the above copyright | |
| notice, this list of conditions and the following disclaimer. | |
| * Redistributions in binary form must reproduce the above copyright | |
| notice, this list of conditions and the following disclaimer in the | |
| documentation and/or other materials provided with the distribution. | |
| * Neither the name of the NVIDIA CORPORATION nor the | |
| names of its contributors may be used to endorse or promote products | |
| derived from this software without specific prior written permission. | |
| THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND | |
| ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED | |
| WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE | |
| DISCLAIMED. IN NO EVENT SHALL NVIDIA CORPORATION BE LIABLE FOR ANY | |
| DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES | |
| (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; | |
| LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND | |
| ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT | |
| (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS | |
| SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | |
| ``` | |