Optimizing Parallel Reduction in CUDA preview / optimizing-parallel-reduction-in-cuda.pdf / PDF4PRO