![]() In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010), pp. Procedia CS 4, 1987–1996 (2011)Ĭhoi, J.W., Singh, A., Vuduc, R.W.: Model-driven autotuning of sparse matrix-vector multiply on GPUs. Center for Computing Sciences, Institute for Defense Analyses (1999)Ĭhien, A.A., Snavely, A., Gahagan, M.: 10x10: A general-purpose architectural approach to heterogeneity and energy efficiency. 18:1–18:11 (November 2009)Ĭarlson, W., Draper, J., Culler, D., Yelick, K., Brooks, E., Warren, K.: Introduction to UPC and language specification. In: Proceedings of the 2009 ACM/IEEE Conference on Supercomputing (SC 2009), pp. 105–114 (January 2010)īell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. ![]() The OpenCL specification 1.2 (November 2011), īaghsorkhi, S.S., Delahaye, M., Patel, S.J., Gropp, W.D., Hwu, W.W.: An adaptive performance modeling tool for GPU architectures. The OpenACC application programming interface 1.0 (November 2011), This process is experimental and the keywords may be updated as the learning algorithm improves. These keywords were added by machine and not by the authors. Finally, we evaluate the current OpenCL programming model, and propose a list of extensions that improve performance portability. We further demonstrate that proper tuning could improve the OpenCL portable performance from the current 15% to a potential 67% of the state-of-the-art performance on the Ivy Bridge CPU. We also identify a number of tuning knobs that are critical to performance portability, including threads-data mapping, data layout, tiling size, data caching, and operation-specific factors. We present detailed performance analysis at assembly level on three exemplar OpenCL benchmarks: SGEMM, SpMV, and FFT. ![]() We study the performance portability of OpenCL across diverse architectures including NVIDIA GPU, Intel Ivy Bridge CPU, and AMD Fusion APU.
0 Comments
Leave a Reply. |