SAM Prefix Scan v1.1
SAM is a fast prefix-scan template written in CUDA with built-in support for higher orders and tuple values as described in this PLDI paper.
Click on SAMinstaller1.1.cu and on sam_pre1.1.h to download the two files needed to install SAM. Demo code showing how to use SAM is available by clicking on testSAM1.1.cu. Note that SAM is protected by this license and that by downloading or installing SAM you agree to the terms and conditions set forth in this license.
To install SAM, place the SAMinstaller1.1.cu and sam_pre1.1.h files into the same directory and compile the installer:
nvcc -O3 -arch=sm_35 SAMinstaller1.1.cu -o SAMinstaller
Then run the installer (note that you may be asked to compile and run the installer a second time):
This should create the file sam.h, which contains the SAM code that has been tuned for your GPU.
The demo code, which shows how to use sam.h, can then be compiled:
nvcc -O3 -arch=sm_35 testSAM1.1.cu -o testSAM
For example, to time the computation of a prefix sum over 100,000,000 elements, enter:
To change the data type, the order, the tuple size, or the operator, please adjust the last few lines in the main function of the demo code accordingly.
The SAM code has been tested on several GPUs with int and long data types, orders 1 through 8, tuple sizes of 1 through 32, and sum, max, and xor operators.
The current code is optimized for Maxwell-based GPUs but also runs on Kepler-based GPUs. It requires at least compute capability 3.0. We recommend compiling with sm_35 even for Maxwell-based GPUs.
S. Maleki, A. Yang, and M. Burtscher. "Higher-Order and Tuple-Based Massively-Parallel Prefix Sums." ACM SIGPLAN Conference on Programming Language Design and Implementation. June 2016. [pdf]
This work has been supported in part by the National Science Foundation under Grant No. 1406304 as well as by equipment donations from Nvidia Corporation.