--- grand_parent: Extended API parent: Synchronization Primitives --- # `cuda::pipeline` Defined in header ``: ```cuda template class cuda::pipeline { public: pipeline() = delete; __host__ __device__ ~pipeline(); pipeline& operator=(pipeline const&) = delete; __host__ __device__ void producer_acquire(); __host__ __device__ void producer_commit(); __host__ __device__ void consumer_wait(); template __host__ __device__ bool consumer_wait_for( cuda::std::chrono::duration const& duration); template __host__ __device__ bool consumer_wait_until( cuda::std::chrono::time_point const& time_point); __host__ __device__ void consumer_release(); __host__ __device__ bool quit(); }; ``` The class template `cuda::pipeline` provides a coordination mechanism which can sequence [asynchronous operations], such as [`cuda::memcpy_async`], into stages. A thread interacts with a _pipeline stage_ using the following pattern: 1. Acquire the pipeline stage. 2. Commit some operations to the stage. 3. Wait for the previously committed operations to complete. 4. Release the pipeline stage. For [`cuda::thread_scope`]s other than `cuda::thread_scope_thread`, a [`cuda::pipeline_shared_state`] is required to coordinate the participating threads. _Pipelines_ can be either _unified_ or _partitioned_. In a _unified pipeline_, all the participating threads are both producers and consumers. In a _partitioned pipeline_, each participating thread is either a producer or a consumer. ## Template Parameters | `Scope` | The scope of threads participating in the _pipeline_. | ## Member Functions | (constructor) [deleted] | `cuda::pipeline` is not constructible. | | [(destructor)] | Destroys the `cuda::pipeline`. | | `operator=` [deleted] | `cuda::pipeline` is not assignable. | | [`producer_acquire`] | Blocks the current thread until the next _pipeline stage_ is available. | | [`producer_commit`] | Commits operations previously issued by the current thread to the current _pipeline stage_. | | [`consumer_wait`] | Blocks the current thread until all operations committed to the current _pipeline stage_ complete. | | [`consumer_wait_for`] | Blocks the current thread until all operations committed to the current _pipeline stage_ complete or after the specified timeout duration. | | [`consumer_wait_until`] | Blocks the current thread until all operations committed to the current _pipeline stage_ complete or until specified time point has been reached. | | [`consumer_release`] | Release the current _pipeline stage_. | | [`quit`] | Quits current thread's participation in the _pipeline_. | ## Notes A thread role cannot change during the lifetime of the pipeline object. ## Example ```cuda #include #include // Disables `pipeline_shared_state` initialization warning. #pragma diag_suppress static_var_with_dynamic_init template __device__ void compute(T* ptr); template __global__ void example_kernel(T* global0, T* global1, cuda::std::size_t subset_count) { extern __shared__ T s[]; auto group = cooperative_groups::this_thread_block(); T* shared[2] = { s, s + 2 * group.size() }; // Create a pipeline. constexpr auto scope = cuda::thread_scope_block; constexpr auto stages_count = 2; __shared__ cuda::pipeline_shared_state shared_state; auto pipeline = cuda::make_pipeline(group, &shared_state); // Prime the pipeline. pipeline.producer_acquire(); cuda::memcpy_async(group, shared[0], &global0[0], sizeof(T) * group.size(), pipeline); cuda::memcpy_async(group, shared[0] + group.size(), &global1[0], sizeof(T) * group.size(), pipeline); pipeline.producer_commit(); // Pipelined copy/compute. for (cuda::std::size_t subset = 1; subset < subset_count; ++subset) { pipeline.producer_acquire(); cuda::memcpy_async(group, shared[subset % 2], &global0[subset * group.size()], sizeof(T) * group.size(), pipeline); cuda::memcpy_async(group, shared[subset % 2] + group.size(), &global1[subset * group.size()], sizeof(T) * group.size(), pipeline); pipeline.producer_commit(); pipeline.consumer_wait(); compute(shared[(subset - 1) % 2]); pipeline.consumer_release(); } // Drain the pipeline. pipeline.consumer_wait(); compute(shared[(subset_count - 1) % 2]); pipeline.consumer_release(); } template void __global__ example_kernel(int*, int*, cuda::std::size_t); ``` [See it on Godbolt](https://godbolt.org/z/zc41bWvja){: .btn } [asynchronous operations]: ../asynchronous_operations.md [`cuda::memcpy_async`]: ../asynchronous_operations/memcpy_async.md [`cuda::thread_scope`]: ../thread_scopes.md [`cuda::pipeline_shared_state`]: ./pipeline_shared_state.md [(destructor)]: ./pipeline/destructor.md [`producer_acquire`]: ./pipeline/producer_acquire.md [`producer_commit`]: ./pipeline/producer_commit.md [`consumer_wait`]: ./pipeline/consumer_wait.md [`consumer_wait_for`]: ./pipeline/consumer_wait.md [`consumer_wait_until`]: ./pipeline/consumer_wait.md [`consumer_release`]: ./pipeline/consumer_release.md [`quit`]: ./pipeline/quit.md