thanks to nvidia ❤

8ae5fc5 over 2 years ago

4.54 kB

	---
	grand_parent: Extended API
	parent: Synchronization Primitives
	nav_order: 2
	---

	# `cuda::barrier`

	Defined in header `<cuda/barrier>`:

	```cuda
	template <cuda::thread_scope Scope,
	typename CompletionFunction = /* unspecified */>
	class cuda::barrier;
	```

	The class template `cuda::barrier` is an extended form of [`cuda::std::barrier`]
	that takes an additional [`cuda::thread_scope`] argument.
	It has the same interface and semantics as [`cuda::std::barrier`], with the
	following additional operations.

	## Barrier Operations

	\| [`cuda::barrier::init`] \| Initialize a `cuda::barrier`. `(friend function)` \|
	\| [`cuda::device::barrier_native_handle`] \| Get the native handle to a `cuda::barrier`. `(function template)` \|

	## NVCC `__shared__` Initialization Warnings

	When using libcu++ with NVCC, a `__shared__` `cuda::barrier` will lead to the
	following warning because `__shared__` variables are not initialized:

	```
	warning: dynamic initialization is not supported for a function-scope static
	__shared__ variable within a __device__/__global__ function
	```

	It can be silenced using `#pragma diag_suppress static_var_with_dynamic_init`.

	To properly initialize a `__shared__` `cuda::barrier`, use the
	[`cuda::barrier::init`] friend function.

	## Concurrency Restrictions

	An object of type `cuda::barrier` or `cuda::std::barrier` shall not be accessed
	concurrently by CPU and GPU threads unless:
	- it is in unified memory and the [`concurrentManagedAccess` property] is 1, or
	- it is in CPU memory and the [`hostNativeAtomicSupported` property] is 1.

	Note, for objects of scopes other than `cuda::thread_scope_system` this is a
	data-race, and thefore also prohibited regardless of memory characteristics.

	Under CUDA Compute Capability 8 (Ampere) or above, when an object of type
	`cuda::barrier<thread_scope_block>` is placed in `__shared__` memory, the
	member function `arrive` performs a reduction of the arrival count among
	[coalesced threads] followed by the arrival operation in one thread.
	Programs shall ensure that this transformation would not introduce errors, for
	example relative to the requirements of [thread.barrier.class paragraph 12]
	of ISO/IEC IS 14882 (the C++ Standard).

	Under CUDA Compute Capability 6 (Pascal) or prior, an object of type
	`cuda::barrier` or `cuda::std::barrier` may not be used.

	## Implementation-Defined Behavior

	For each [`cuda::thread_scope`] `S` and `CompletionFunction` `F`, the value of
	`cuda::barrier<S, F>::max()` is as follows:

	\| [`cuda::thread_scope`] `S` \| `CompletionFunction` `F` \| `barrier<S, F>::max()` \|
	\|--------------------------------\|--------------------------\|----------------------------------------------------------\|
	\| `cuda::thread_scope_block` \| Default or user-provided \| `(1 << 20) - 1` \|
	\| Not `cuda::thread_scope_block` \| Default \| `cuda::std::numeric_limits<cuda::std::int32_t>::max()` \|
	\| Not `cuda::thread_scope_block` \| User-provided \| `cuda::std::numeric_limits<cuda::std::ptrdiff_t>::max()` \|

	## Example

	```cuda
	#include <cuda/barrier>

	__global__ void example_kernel() {
	// This barrier is suitable for all threads in the system.
	cuda::barrier<cuda::thread_scope_system> a(10);

	// This barrier has the same type as the previous one (`a`).
	cuda::std::barrier<> b(10);

	// This barrier is suitable for all threads on the current processor (e.g. GPU).
	cuda::barrier<cuda::thread_scope_device> c(10);

	// This barrier is suitable for all threads in the same thread block.
	cuda::barrier<cuda::thread_scope_block> d(10);
	}
	```

	[See it on Godbolt](https://godbolt.org/z/ehdrY8Kae){: .btn }


	[`cuda::thread_scope`]: ../thread_scopes.md

	[`cuda::barrier::init`]: ./barrier/init.md
	[`cuda::device::barrier_native_handle`]: ./barrier/barrier_native_handle.md

	[`cuda::std::barrier`]: https://en.cppreference.com/w/cpp/thread/barrier

	[thread.barrier.class paragraph 12]: https://eel.is/c++draft/thread.barrier.class#12

	[coalesced threads]: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#coalesced-group-cg

	[`concurrentManagedAccess` property]: https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaDeviceProp.html#structcudaDeviceProp_116f9619ccc85e93bc456b8c69c80e78b
	[`hostNativeAtomicSupported` property]: https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaDeviceProp.html#structcudaDeviceProp_1ef82fd7d1d0413c7d6f33287e5b6306f