CFP last date
22 April 2024
Reseach Article

Effects of Easy Hybrid Parallelization with CUDA for OpenMX

by Jae-hyeon Parq, Erik Sevre, Sang-mook Lee
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 98 - Number 13
Year of Publication: 2014
Authors: Jae-hyeon Parq, Erik Sevre, Sang-mook Lee
10.5120/17244-7580

Jae-hyeon Parq, Erik Sevre, Sang-mook Lee . Effects of Easy Hybrid Parallelization with CUDA for OpenMX. International Journal of Computer Applications. 98, 13 ( July 2014), 20-27. DOI=10.5120/17244-7580

@article{ 10.5120/17244-7580,
author = { Jae-hyeon Parq, Erik Sevre, Sang-mook Lee },
title = { Effects of Easy Hybrid Parallelization with CUDA for OpenMX },
journal = { International Journal of Computer Applications },
issue_date = { July 2014 },
volume = { 98 },
number = { 13 },
month = { July },
year = { 2014 },
issn = { 0975-8887 },
pages = { 20-27 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume98/number13/17244-7580/ },
doi = { 10.5120/17244-7580 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:26:07.620972+05:30
%A Jae-hyeon Parq
%A Erik Sevre
%A Sang-mook Lee
%T Effects of Easy Hybrid Parallelization with CUDA for OpenMX
%J International Journal of Computer Applications
%@ 0975-8887
%V 98
%N 13
%P 20-27
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

A MPI-friendly density functional theory (DFT) source code was modified within hybrid parallelization including CUDA. The objective is to find out how simple conversions within the hybrid parallelization with mid-range GPUs affect DFT code not originally suitable to CUDA. Several rules of hybrid parallelization for numerical-atomic-orbital (NAO) DFT codes were settled. The test was performed on a magnetite material system with OpenMX code by utilizing a hardware system containing 2 Xeon E5606 CPUs and 2 Quadro 4000 GPUs. 3-way hybrid routines obtained a speedup of 7. 55 while 2-way hybrid speedup by 10. 94. GPUs with CUDA complement the efficiency of OpenMP and compensate CPUs' excessive competition within MPI.

References
  1. A. Ghosh, P. R. Taylor, "High-level ab initio calculations on the energetic of low-lying spin states of biologically relevant transition metal complexes: a first progress report", Curr. Opin. Chem. Biol. 7:113?124 2003.
  2. OpenMX webpage, http://www. openmx-square. org/
  3. SIESTA webpage, http://www. icmab. es/siesta/
  4. Message Passing Interface Forum, http://www. mpi-forum. org/
  5. OpenMP, http://openmp. org/wp/
  6. H. Jin, D. Jespersen, P. Mehrotra, R. Biswas, L. Huang, B. Chapman, "High performance computing using MPI and OpenMP on multi-core parallel systems", Parallel Comput. 37:562?575, 2011.
  7. J. E. Stone, D. J. Hardy, I. S. Ufimtsev, K. Schulten, "GPU-accelerated molecular modeling coming of age", J. Mol. Graph. 29(2):116?125, 2010.
  8. S. Maintz, B. Eck, R. Dronskowski, "Speeding up plane-wave electronic-structure calculations using graphics-process units", Comput. Phys. Commun. 182:1421?1427, 2011.
  9. K. A. Wilkson, P. Sherwood, M. F. Guest, K. J. Naidoo, "Acceleration of the GAMESS-UK Electronic Structure Package on Graphical Processing Units", J. Comput. Chem. 32(10):2313?2318, 2011.
  10. L. Genovese, M. Ospici, T. Deutsch, J. -F. Méhaut, A. Neelov, S. Goedecker, "Density functional theory calculation on many-cores hybrid central processing unit-graphic processing unit architectures", J. Chem. Phys. 131:034103, 2009.
  11. C. T. Yang, C. L. Huang, C. F. Lin, "Hybrid CUDA, OpenMP, and MPI parallel programming on multicore GPU clusters", Comput. Phys. Commun. 182:266?269, 2011.
  12. F. Wang, C. -Q. Yang, Y. -F. Du, J. Chen, H. -Z. Yi, W. -X. Xu, "Optimizing linpack benchmark on GPU-accelerated petascale supercomputer", J. Comput. Sci. & Technol. 26(5): 854?865, 2011.
  13. H. -Y. Schive, U. -H. Zhang, T. Chiueh, "Directionally unsplit hydrodynamic schemes with hybrid MPI/OpenMP/GPU parallelization in AMR", Int. J. High Perform. C. 26(4):367?377, 2011.
  14. F. Lu, J. Song, F. Yin, X. Zhu, "Performance evaluation of hybrid programming patterns for large CPU/GPU heterogeneous clusters", Comput. Phys. Commun. 183:1172?1181, 2012.
  15. F. Lu, J. Song, X. Cao, X. Zhu, "CPU/GPU computing for long-wave radiation physics on large GPU clusters", Comput. Geosci. 41:47?55, 2012.
  16. C. T. Hsu, K. F. Sin, S. W. Chiang, "Parallel computation for Boltzmann equation simulation with Dynamic Discrete Ordinate Method (DDOM)", Comput. Fluids 54:39?44, 2012.
  17. M. H. Fadhil, M. I. Younis, "Parallelizing RSA Algorithm on Multicore CPU and GPU", Int. J. Comput. Appl. 87(6):15?22, 2014.
  18. You can see our patch for OpenMX3. 6 at http://www. eriksevre. com/projects/openmxcuda/
  19. Martin R. M. 2004 Electronic Structure: basic theory and practical methods. Cambridge University press.
  20. Kohanoff J. 2006 Electronic Structure Calculations for Solids and Molecules. Cambridge University press.
  21. U. von Barth, L. Hedin, "A local exchange-correlation potential for the spin polarized case: I", J. Phys. C: Solid State Phys. 5:1629?1642, 1972.
  22. CUDA C programming Guide, http://docs. nvidia. com/cuda/cuda-c-programming-guide/
  23. CUDA C Best Practices Guide, http://docs. nvidia. com/cuda/cuda-c-best-practices-guide/
  24. D. Hisley, G. Agrawal, P. Satya-narayana, L. Pollock, "Porting and performance evaluation of irregular codes using OpenMP", Concurrency: Pract. Exper. 12:1241–1259, 2000.
  25. B. Chapman, F. Bregier, A. Patil, A. Prabhakar, "Achieving performance under OpenMP on ccNUMA and software distributed shared memory systems", Concurrency Computat. : Pract. Exper. 14:713–739, 2002.
  26. A. Marowka, Z. Liu, B. Chapman, "OpenMP-oriented applications for distributed shared memory architectures", Concurrency Computat. : Pract. Exper. 16:371–384, 2004.
  27. M. J. Berger, M. J. Aftosmis, D. D. Marshall, S. M. Murman, "Performance of a newCFD flowsolver using a hybrid programming paradigm", J. Parallel Distr. Com. 65:414423, 2005.
  28. F. Broquedis, N. Furmento, B. Goglin, P. -A. Wacrenier, R. Namyst, "ForestGOMP: An Efficient OpenMP Environment for NUMA Architectures", Int. J. Parallel Prog. 38:418–439, 2010.
  29. A. Marongiu, L. Benini, "An OpenMP Compiler for Efficient Use of Distributed Scratchpad Memory in MPSoCs", IEEE T. Compu. 61(2):222–236, 2012.
  30. W. M. Brown, P. Wang, S. J. Plimpton, A. M. Tharrington, "Implementing molecular dynamics on hybrid high performance computers – short range forces", Comput. Phys. Commun. 182:898–911, 2011.
  31. K. Momma, F. Izumi, "VESTA: a three-dimensional visualization system for electronic and structural analysis", J. Appl. Crystallogr. , 41:653–658, 2008.
  32. T. Ozaki, H. Kino, "Numerical atomic basis orbitals from H to Kr", Phys. Rev. B 69:195113, 2004.
  33. T. Ozaki, H. Kino, "Efficient projector expansion for the ab initio LCAO method", Phys. Rev. B 72:045121, 2005.
  34. H. C. Hamilton, "Neutron Diffraction Investigation of the 119°K Transition in Magnetite", Phys. Rev. 110(5):1050–1057, 1958.
  35. A. S. Householder, "Unitary Triangularization of a Nonsymmetric Matrix", J. ACM 5(4):339–342, 1958.
  36. Netlib website, http://www. netlib. org/
  37. CUBLAS website, http://developer. nvidia. com/cublas
  38. OpenMX technical note: Householder Method for Tridiagonalization, http://www. openmx-square. org/tech_notes/tech10-1_0. pdf
  39. Intel Xeon 5600 series information website, http://download. intel. com/support/processors/xeon/sb/xeon 5600. pdf
  40. NVIDIA QUADRO 4000 website, http://www. nvidia. com/object/product-quadro-4000-us. html
Index Terms

Computer Science
Information Sciences

Keywords

MPI CUDA OpenMP electronic structure graphical processing unit pseudo-atomic-orbital