- GPU: NVIDIA GeForce RTX 3080
- CPU: Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz x 40
- CUDA/NVCC: 11.7
- OS: Ubuntu 20.04
- Host Compiler: g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
- Compiler Options:
-O3 -std=c++17
- Compiler Options:
版本 | 耗时(us) | 加速比 |
---|---|---|
reduce_cpu | 22,461 | 1 |
reduce_V1 | 2,127 | 10.6 |
reduce_V2 | 1,261 | 17.8 |
reduce_V3 | 1,203 | 18.7 |
reduce_V4 | 646 | 34.8 |
reduce_V5 | 431 | 52.1 |
reduce_V6 | 391 | 57.4 |
reduce_V7 | 250 | 89.9 |
TODO
- Harris, Mark. "Optimizing parallel reduction in CUDA." Nvidia developer technology 2.4 (2007): 70.
- NVIDIA/cuda-samples:
Samples/2_Concepts_and_Techniques/reduction/reduction.cpp