We use an int16 variable in the shared memory (a semaphore), which can be looked up by any warp on the SM. The thing you have to be carefull about is to have all instructions related to SMEM or GMEM finished before changing the state of the semaphore. For instance, you have to explicitly wait for SMEM stores to be over before switching the semaphore.