Parallel Implementations of RCM Algorithm for Bandwidth Reduction of Sparse Matrices

Thiago Nascimento Rodrigues, Maria Claudia Silva Boeres, Lucia Catabriga


The Reverse Cuthill-McKee (RCM) algorithm is a well-known heuristic
for reordering sparse matrices. It is typically used to speed up the computation of
sparse linear systems of equations. This paper describes two parallel approaches
for the RCM algorithm as well as an optimized version of each one based on some
proposed enhancements. The first one exploits a strategy for reducing lazy threads,
while the second one makes use of a static bucket array as the main data structure
and suppress some steps performed by the original algorithm. These related changes
led to outstanding reordering time results and significant bandwidth reductions.
The performance of two algorithms is compared with the respective implementation
made available by Boost library. The OpenMP framework is used for supporting
the parallelism and both versions of the algorithm are tested with large sparse and
structural symmetric matrices.


Parallel RCM; Bandwidth Reduction; Sparse Matrices

Full Text:



S. Aluru, Teaching parallel computing through parallel prefix, “The Interna-

tional Conference for High Performance Computing, Networking, Storage and Analysis“, 2012, Salt Lake City, USA.

G. Anderson and D. Ferro and R. Hilton, “Connecting with Computer Science - Second Edition”, Course Technology, Cengage Learning, Boston, 2011.

M. J. Atallah and S. Fox, “Algorithms and Theory of Computation Handbook”, CRC Press, Inc., Boca Raton, FL, USA, 1978.Optimized Parallel RCM Algorithms 17

P. R. Amestoy and T. A. Davis and I. S. Duff, An Approximate Minimum Degree Ordering Algorithm, SIAM J. Matrix Anal. Appl., 17, No. 4 (1996), 886–905.

L. Cardelli and X. Leroy, Abstract Types and The Dot Notation, Research report 56, Digital Equipment Corporation, Systems Research Center, March 10, 1990.

D. Chazan and W. Miranker, Chaotic Relaxation, Linear Algebra and its Applications, 2, No. 2 (1969), 199–222.

C. Chevalier and F. Pellegrini, PT-Scotch: A Tool for Efficient Parallel Graph Ordering, Parallel Comput., 34, No. 6-8 (2008), 318–331.

E. Cuthill and J. McKee, Reducing the Bandwidth of Sparse Symmetric Matrices, at “Proceedings of the 24th ACM National Conference“, pp. 157–172, ACM ’69, 1969.

T. A. Davis and Y. Hu, The University of Florida Sparse Matrix Collection,

ACM Trans. Math. Softw., 38, No. 1 (2011), pp. 1:1–1:25.


M. A. Hassaan and M. Burtscher and K. Pingali, Ordered vs. Unordered: A Comparison of Parallelism and Work-efficiency in Irregular Algorithms, at “Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming“, pp. 3–12, PPoPP ’11, 2011.

B. Lugon and L. Catabriga., Algoritmos de reordenamento de matrizes esparsas aplicados a precondicionadores ILU(p), at “Simpósio Brasileiro de Pesquisa Operacional“, pp. 2343-–2354, XLV SBPO, 2013.

A. George, Nested Dissection of a Regular Finite Element Mesh, SIAM Journal on Numerical Analysis, 10, No. 2 (1973), 345–363.

A. George and J. W. H. Liu, A Fast Implementation of the Minimum Degree Algorithm Using Quotient Graphs, ACM Trans. Math. Softw., 6, No. 3 (1980), 337–358.

K. I. Karantasis and A. Lenharth and D. Nguyen and M. Garzarán and K.

Pingali, Parallelization of Reordering Algorithms for Bandwidth and Wavefront

Reduction, at “Proceedings of the International Conference for High Perfor-

mance Computing, Networking, Storage and Analysis“, pp. 921–932, SC ’14,

G. Karypis and V. Kumar, A Parallel Algorithm for Multilevel Graph Par-

titioning and Sparse Matrix Ordering, J. Parallel Distrib. Comput., 48, No. 1

(1998), 71–95.

C. E. Leiserson and T. B. Schardl, A Work-efficient Parallel Breadth-first

Search Algorithm (or How to Cope with the Nondeterminism of Reducers),

at“Proceedings of the Twenty-second Annual ACM Symposium on Parallelism

in Algorithms and Architectures“, pp. 303–314, ACM, 2010.18

W. Lin, Improving Parallel Ordering of Sparse Matrices Using Genetic Algorithms, Appl. Intell., 23, No. 3 (2005), 257–265.

W. Liu and A. H. Sherman, Comparative Analysis of the Cuthill-McKee and the Reverse Cuthill-McKee Ordering Algorithms for Sparse Matrices, SIAM Journal on Numerical Analysis, 13, No. 2 (1976), 198–213.

U. Meyer and A. Negoescu and V. Weichert, New Bounds for Old Algo-

rithms: On the Average-Case Behavior of Classic Single-Source Shortest-Paths Approaches, at “Theory and Practice of Algorithms in (Computer) Systems: First International ICST Conference - Proceedings“, pp. 217–228, ICST, 2011.


C. H. Papadimitriou, The NP-Completeness of the bandwidth minimization problem, Computing, 16, No. 3 (1976), 263–270.

T. N. Rodrigues and M. C. S. Boeres and L. Catabriga, An Implementation of the Unordered Parallel RCM for Bandwidth Reduction of Large Sparse Matrices, at “Congresso Nascional de Matemática Aplicada e Computacional“, XXXVI CNMAC, 2016, (in press).

T. N. Rodrigues and M. C. S. Boeres and L. Catabriga, An Optimized Leveled Parallel RCM for Bandwidth Reduction of Sparse Symmetric Matrices, at “Simpósio Brasileiro de Pesquisa Operacional“, XLVIII SBPO, 2016, (in press).

B. S. W. Schröder, Algorithms for the Fixed Point Property, Theoretical Computer Sciense, 217, No. 2 (1999), 301–358.

S. W. Sloan, An algorithm for profile and wavefront reduction of sparse matrices, International Journal for Numerical Methods in Engineering, 23, No. 2 (1986), 239–251.


Article Metrics

Metrics Loading ...

Metrics powered by PLOS ALM


  • There are currently no refbacks.

Trends in Computational and Applied Mathematics

A publication of the Brazilian Society of Applied and Computational Mathematics (SBMAC)


Indexed in:




Desenvolvido por:

Logomarca da Lepidus Tecnologia