Skip to content

Latest commit

 

History

History

5-reduce-sum

规约(Reduction)

实验环境

  • GPU: NVIDIA GeForce RTX 3080
  • CPU: Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz x 40
  • CUDA/NVCC: 11.7
  • OS: Ubuntu 20.04
  • Host Compiler: g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
    • Compiler Options: -O3 -std=c++17

性能数据

版本 耗时(us) 加速比
reduce_cpu 22,461 1
reduce_V1 2,127 10.6
reduce_V2 1,261 17.8
reduce_V3 1,203 18.7
reduce_V4 646 34.8
reduce_V5 431 52.1
reduce_V6 391 57.4
reduce_V7 250 89.9

算法说明

TODO

参考