Spaces:
Running
on
T4
Running
on
T4
The task is to write a CUDA kernel function on GPU, we have the input described as below: | |
[input.txt] | |
And we have also generated the benchmark code for this task: | |
[benchmark code] | |
Optimize the kernel function for less execution time on GPU. | |
The output should be the content of whole .cu file containing ONE kernel function. | |
Do not modify the test part. Note the test data contains exactly five input sets. The generated .cu file must ensure that for each input set, the kernel function is called exactly once, resulting in a total of five kernel invocations. Do not include any extra timing logic, profiling wrappers, or repeat kernel calls that could cause each input to trigger multiple kernel launches. |