Reducing Inter-Process Communication Overhead in Parallel Sparse Matrix-Matrix Multiplication

Document Type


Publication Date



Parallel sparse matrix-matrix multiplication algorithms (PSpGEMM) spend most of their running time on inter-process communication. In the case of distributed matrix-matrix multiplications, much of this time is spent on interchanging the partial results that are needed to calculate the final product matrix. This overhead can be reduced with a one-dimensional distributed algorithm for parallel sparse matrix-matrix multiplication that uses a novel accumulation pattern based on the logarithmic complexity of the number of processors (i.e., O (log (p)) where p is the number of processors). This algorithm's MPI communication overhead and execution time were evaluated on an HPC cluster, using randomly generated sparse matrices with dimensions up to one million by one million. The results showed a reduction of inter-process communication overhead for matrices with larger dimensions compared to another one dimensional parallel algorithm that takes O(p) run-time complexity for accumulating the results.