File size: 6,188 Bytes

8ae5fc5

---
grand_parent: Extended API
parent: Synchronization Primitives
---

# `cuda::pipeline`

Defined in header `<cuda/pipeline>`:

```cuda
template <cuda::thread_scope Scope>
class cuda::pipeline {
public:
  pipeline() = delete;

  __host__ __device__ ~pipeline();

  pipeline& operator=(pipeline const&) = delete;

  __host__ __device__ void producer_acquire();

  __host__ __device__ void producer_commit();

  __host__ __device__ void consumer_wait();

  template <typename Rep, typename Period>
  __host__ __device__ bool consumer_wait_for(
    cuda::std::chrono::duration<Rep, Period> const& duration);

  template <typename Clock, typename Duration>
  __host__ __device__
  bool consumer_wait_until(
    cuda::std::chrono::time_point<Clock, Duration> const& time_point);

  __host__ __device__ void consumer_release();

  __host__ __device__ bool quit();
};
```

The class template `cuda::pipeline` provides a coordination mechanism which
  can sequence [asynchronous operations], such as [`cuda::memcpy_async`], into
  stages.

A thread interacts with a _pipeline stage_ using the following pattern:
1. Acquire the pipeline stage.
2. Commit some operations to the stage.
3. Wait for the previously committed operations to complete.
4. Release the pipeline stage.

For [`cuda::thread_scope`]s other than `cuda::thread_scope_thread`, a
  [`cuda::pipeline_shared_state`] is required to coordinate the participating
  threads.

_Pipelines_ can be either _unified_ or _partitioned_.
In a _unified pipeline_, all the participating threads are both producers and
  consumers.
In a _partitioned pipeline_, each participating thread is either a producer or
  a consumer.

## Template Parameters

| `Scope` | The scope of threads participating in the _pipeline_. |

## Member Functions

| (constructor) [deleted] | `cuda::pipeline` is not constructible.                                                                                                            |
| [(destructor)]          | Destroys the `cuda::pipeline`.                                                                                                                    |
| `operator=` [deleted]   | `cuda::pipeline` is not assignable.                                                                                                               |
| [`producer_acquire`]    | Blocks the current thread until the next _pipeline stage_ is available.                                                                           |
| [`producer_commit`]     | Commits operations previously issued by the current thread to the current _pipeline stage_.                                                       |
| [`consumer_wait`]       | Blocks the current thread until all operations committed to the current _pipeline stage_ complete.                                                |
| [`consumer_wait_for`]   | Blocks the current thread until all operations committed to the current _pipeline stage_ complete or after the specified timeout duration.        |
| [`consumer_wait_until`] | Blocks the current thread until all operations committed to the current _pipeline stage_ complete or until specified time point has been reached. |
| [`consumer_release`]    | Release the current _pipeline stage_.                                                                                                             |
| [`quit`]                | Quits current thread's participation in the _pipeline_.                                                                                           |

## Notes

A thread role cannot change during the lifetime of the pipeline object.

## Example

```cuda
#include <cuda/pipeline>
#include <cooperative_groups.h>

// Disables `pipeline_shared_state` initialization warning.
#pragma diag_suppress static_var_with_dynamic_init

template <typename T>
__device__ void compute(T* ptr);

template <typename T>
__global__ void example_kernel(T* global0, T* global1, cuda::std::size_t subset_count) {
  extern __shared__ T s[];
  auto group = cooperative_groups::this_thread_block();
  T* shared[2] = { s, s + 2 * group.size() };

  // Create a pipeline.
  constexpr auto scope = cuda::thread_scope_block;
  constexpr auto stages_count = 2;
  __shared__ cuda::pipeline_shared_state<scope, stages_count> shared_state;
  auto pipeline = cuda::make_pipeline(group, &shared_state);

  // Prime the pipeline.
  pipeline.producer_acquire();
  cuda::memcpy_async(group, shared[0],
                     &global0[0], sizeof(T) * group.size(), pipeline);
  cuda::memcpy_async(group, shared[0] + group.size(),
                     &global1[0], sizeof(T) * group.size(), pipeline);
  pipeline.producer_commit();

  // Pipelined copy/compute.
  for (cuda::std::size_t subset = 1; subset < subset_count; ++subset) {
    pipeline.producer_acquire();
    cuda::memcpy_async(group, shared[subset % 2],
                       &global0[subset * group.size()],
                       sizeof(T) * group.size(), pipeline);
    cuda::memcpy_async(group, shared[subset % 2] + group.size(),
                       &global1[subset * group.size()],
                       sizeof(T) * group.size(), pipeline);
    pipeline.producer_commit();
    pipeline.consumer_wait();
    compute(shared[(subset - 1) % 2]);
    pipeline.consumer_release();
  }

  // Drain the pipeline.
  pipeline.consumer_wait();
  compute(shared[(subset_count - 1) % 2]);
  pipeline.consumer_release();
}

template void __global__ example_kernel<int>(int*, int*, cuda::std::size_t);
```

[See it on Godbolt](https://godbolt.org/z/zc41bWvja){: .btn }


[asynchronous operations]: ../asynchronous_operations.md
[`cuda::memcpy_async`]: ../asynchronous_operations/memcpy_async.md

[`cuda::thread_scope`]: ../thread_scopes.md
[`cuda::pipeline_shared_state`]: ./pipeline_shared_state.md

[(destructor)]: ./pipeline/destructor.md
[`producer_acquire`]: ./pipeline/producer_acquire.md
[`producer_commit`]: ./pipeline/producer_commit.md
[`consumer_wait`]: ./pipeline/consumer_wait.md
[`consumer_wait_for`]: ./pipeline/consumer_wait.md
[`consumer_wait_until`]: ./pipeline/consumer_wait.md
[`consumer_release`]: ./pipeline/consumer_release.md
[`quit`]: ./pipeline/quit.md