zhujiace's picture
v1
b050f40
raw
history blame contribute delete
525 Bytes
Optimize the kernel function for less execution time on GPU.
The output should be the content of whole .cu file containing ONE kernel function.
Do not modify the test part. Note the test data contains exactly five input sets. The generated .cu file must ensure that for each input set, the kernel function is called exactly once, resulting in a total of five kernel invocations. Do not include any extra timing logic, profiling wrappers, or repeat kernel calls that could cause each input to trigger multiple kernel launches.