Question about VASP 6.3.2 with NVHPC+mkl

Message

siwakorn_sukharom · #1 Post by **siwakorn_sukharom** » Mon Mar 27, 2023 5:22 am

Can VASP compiled with NVHPC+mkl but no OpenMPI?

I'm trying to compile VASP for GPU

According to the makefile.include templates, it seems like OpenMPI must be used in combination with MKL.
Can I use NVHPC + mkl (from Intel-oneapi-2022) and use MPICH (that available on my system instead)

These are my modules that i plan to use if possible
- nvhpc/21.9
- cray-mpich/8.1.17
- intel-oneapi-mixed/2022.0.2 (for mkl)

If it can, what lines in makefile.include that i have to change? is it "-lmkl_blacs_openmpi_lp64" in ..

# Intel MKL for FFTW, BLAS, LAPACK, and scaLAPACK
MKLROOT ?= /path/to/your/mkl/installation
LLIBS_MKL = -Mmkl -L$(MKLROOT)/lib/intel64 -lmkl_scalapack_lp64 -lmkl_blacs_openmpi_lp64
INCS += -I$(MKLROOT)/include/fftw

#2 Post by **alexey.tal** » Tue Mar 28, 2023 3:31 pm

Dear siwakorn_sukharom,

I think that such combination (NVHPC + intel mkl + MPICH) should be possible. What appears to be a problem?
In the makefile.include you need to provide the paths for the libraries and the compilers (see the details here).
Regarding the use of MKL, it makes sense to use -lmkl_blacs_intelmpi_lp64 in this case, although I don't have experience using Cray, so you might want to check that with the technical support of the computer you are using.

siwakorn_sukharom · #3 Post by **siwakorn_sukharom** » Fri Apr 07, 2023 4:49 am

On my own attempt I got this error with this makefile.include

Code: Select all

Currently Loaded Modules:
  1) craype-x86-rome      4) xpmem/2.4.4-2.3_2.12__gff0e1d9.shasta   7) cray-dsmml/0.2.2    10) intel-oneapi-mixed/2022.0.2
  2) libfabric/1.15.0.0   5) nvhpc/21.9                              8) cray-mpich/8.1.17
  3) craype-network-ofi   6) craype/2.7.16                           9) PrgEnv-nvhpc/8.3.3

Code: Select all

# Default precompiler options
CPP_OPTIONS = -DHOST=\"LinuxNV\" \
              -DMPI -DMPI_BLOCK=8000 -Duse_collective \
              -DscaLAPACK \
              -DCACHE_SIZE=4000 \
              -Davoidalloc \
              -Dvasp6 \
              -Duse_bse_te \
              -Dtbdyn \
              -Dqd_emulate \
              -Dfock_dblbuf \
              -D_OPENMP \
              -D_OPENACC \
              -DUSENCCL -DUSENCCLP2P

CPP         = ftn -Mpreprocess -Mfree -Mextend -E $(CPP_OPTIONS) $*$(FUFFIX)  > $*$(SUFFIX)

# N.B.: you might need to change the cuda-version here
#       to one that comes with your NVIDIA-HPC SDK
FC          = ftn -acc -gpu=cc60,cc70,cc80,cuda11.0 -mp
FCL         = ftn -acc -gpu=cc60,cc70,cc80,cuda11.0 -mp -c++libs

FREE        = -Mfree

FFLAGS      = -Mbackslash -Mlarge_arrays

OFLAG       = -fast

DEBUG       = -Mfree -O0 -traceback

OBJECTS     = fftmpiw.o fftmpi_map.o fftw3d.o fft3dlib.o

LLIBS       = -cudalib=cublas,cusolver,cufft,nccl -cuda

# Redefine the standard list of O1 and O2 objects
SOURCE_O1  := pade_fit.o
SOURCE_O2  := pead.o

# For what used to be vasp.5.lib
CPP_LIB     = $(CPP)
FC_LIB      = ftn
CC_LIB      = cc -w
CFLAGS_LIB  = -O
FFLAGS_LIB  = -O1 -Mfixed
FREE_LIB    = $(FREE)

OBJECTS_LIB = linpack_double.o

# For the parser library
CXX_PARS    = CC --no_warnings

##
## Customize as of this point! Of course you may change the preceding
## part of this file as well if you like, but it should rarely be
## necessary ...
##
# When compiling on the target machine itself , change this to the
# relevant target when cross-compiling for another architecture
VASP_TARGET_CPU ?= -tp host
FFLAGS     += $(VASP_TARGET_CPU)

# Specify your NV HPC-SDK installation (mandatory)
#... first try to set it automatically
NVROOT      =$(shell which nvfortran | awk -F /compilers/bin/nvfortran '{ print $$1 }')

# Software emulation of quadruple precsion (mandatory)
QD         ?= $(NVROOT)/compilers/extras/qd
LLIBS      += -L$(QD)/lib -lqdmod -lqd
INCS       += -I$(QD)/include/qd

# Intel MKL for FFTW, BLAS, LAPACK, and scaLAPACK
FCL        += mkl
MKLROOT    ?= /path/to/your/mkl/installation
LLIBS_MKL   = -Mmkl -L$(MKLROOT)/lib/intel64 -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64
INCS       += -I$(MKLROOT)/include/fftw

Code: Select all

ftn -acc -gpu=cc60,cc70,cc80,cuda11.0 -mp -c++libs mkl -o vasp c2f_interface.o nccl2for.o simd.o base.o profiling.o string.o tutor.o version.o command_line.o vhdf5_base.o incar_reader.o reader_base.o openmp.o openacc_struct.o mpi.o mpi_shmem.o mathtools.o hamil_struct.o radial_struct.o pseudo_struct.o mgrid_struct.o wave_struct.o nl_struct.o mkpoints_struct.o poscar_struct.o afqmc_struct.o fock_glb.o chi_glb.o smart_allocate.o xml.o extpot_glb.o constant.o ml_ff_c2f_interface.o ml_ff_prec.o ml_ff_constant.o ml_ff_taglist.o ml_ff_struct.o ml_ff_mpi_help.o ml_ff_mpi_shmem.o vdwforcefield_glb.o jacobi.o main_mpi.o openacc.o scala.o asa.o lattice.o poscar.o ini.o mgrid.o ml_ff_error.o ml_ff_mpi.o ml_ff_helper.o ml_ff_logfile.o ml_ff_math.o ml_ff_iohandle.o ml_ff_memory.o ml_ff_abinitio.o ml_ff_ff.o ml_ff_mlff.o setex_struct.o xclib.o vdw_nl.o xclib_grad.o setex.o radial.o pseudo.o gridq.o ebs.o symlib.o mkpoints.o random.o wave.o wave_mpi.o wave_high.o bext.o spinsym.o symmetry.o lattlib.o nonl.o nonlr.o nonl_high.o dfast.o choleski2.o mix.o hamil.o xcgrad.o xcspin.o potex1.o potex2.o constrmag.o cl_shift.o relativistic.o LDApU.o paw_base.o metagga.o egrad.o pawsym.o pawfock.o pawlhf.o diis.o rhfatm.o hyperfine.o fock_ace.o paw.o mkpoints_full.o charge.o Lebedev-Laikov.o stockholder.o dipol.o solvation.o scpc.o pot.o tet.o dos.o elf.o hamil_rot.o chain.o dyna.o fileio.o vhdf5.o sphpro.o us.o core_rel.o aedens.o wavpre.o wavpre_noio.o broyden.o dynbr.o reader.o writer.o xml_writer.o brent.o stufak.o opergrid.o stepver.o fast_aug.o fock_multipole.o fock.o fock_dbl.o fock_frc.o mkpoints_change.o subrot_cluster.o sym_grad.o mymath.o npt_dynamics.o subdftd3.o subdftd4.o internals.o dynconstr.o dimer_heyden.o dvvtrajectory.o vdwforcefield.o nmr.o pead.o k-proj.o subrot.o subrot_scf.o paircorrection.o rpa_force.o ml_reader.o ml_interface.o force.o pwlhf.o gw_model.o optreal.o steep.o rmm-diis.o davidson.o david_inner.o root_find.o lcao_bare.o locproj.o electron_common.o electron.o rot.o electron_all.o shm.o pardens.o optics.o constr_cell_relax.o stm.o finite_diff.o elpol.o hamil_lr.o rmm-diis_lr.o subrot_lr.o lr_helper.o hamil_lrf.o elinear_response.o ilinear_response.o linear_optics.o setlocalpp.o wannier.o electron_OEP.o electron_lhf.o twoelectron4o.o gauss_quad.o m_unirnk.o minimax_ini.o minimax_dependence.o minimax_functions1D.o minimax_functions2D.o minimax_struct.o minimax_varpro.o minimax.o umco.o mlwf.o ratpol.o pade_fit.o screened_2e.o wave_cacher.o crpa.o chi_base.o wpot.o local_field.o ump2.o ump2kpar.o fcidump.o ump2no.o bse_te.o bse.o time_propagation.o acfdt.o afqmc.o rpax.o chi.o acfdt_GG.o dmft.o GG_base.o greens_orbital.o lt_mp2.o rnd_orb_mp2.o greens_real_space.o chi_GG.o chi_super.o sydmat.o rmm-diis_mlr.o linear_response_NMR.o wannier_interpol.o wave_interpolate.o linear_response.o auger.o dmatrix.o phonon.o wannier_mats.o elphon.o core_con_mat.o embed.o extpot.o rpa_high.o fftmpiw.o fftmpi_map.o fftw3d.o fft3dlib.o main.o  -Llib -ldmy -Lparser -lparser -cudalib=cublas,cusolver,cufft,nccl -cuda -L/opt/nvidia/hpc_sdk/Linux_x86_64/21.9/compilers/extras/qd/lib -lqdmod -lqd
/usr/bin/ld: warning: /opt/nvidia/hpc_sdk/Linux_x86_64/21.9/compilers/lib/nvhpc.ld contains output sections; did you forget -T?
/usr/bin/ld: cannot find mkl: No such file or directory
pgacclnk: child process exit status 1: /usr/bin/ld
make[2]: *** [makefile:132: vasp] Error 2
make[2]: Leaving directory '/lustrefs/disk/home/siwakorn/VASP/vasp.6.3.2/build/std'
cp: cannot stat 'vasp': No such file or directory
make[1]: *** [makefile:129: all] Error 1
make[1]: Leaving directory '/lustrefs/disk/home/siwakorn/VASP/vasp.6.3.2/build/std'
make: *** [makefile:17: std] Error 2

After consulting with the system administrator, he suggest me to use nvblas on the gpu part, blas from mkl on the cpu part and remove scalapack (remove -DscaLAPACK). He told me that scalapack from mkl doesn't work on AMD core architectures as mentioned in https://www.intel.com/content/www/us/en ... tines.html

ScaLAPACK routines are provided only for Intel® 64 or Intel® Many Integrated Core architectures

Will this leads to any problem when VASP doesn't use Scalapack? so far the vasp executables compiled with this way can run a regular job fine without problem

Code: Select all

# Default precompiler options
CPP_OPTIONS = -DHOST=\"LinuxNV\" \
              -DMPI -DMPI_BLOCK=8000 -Duse_collective \
              -DCACHE_SIZE=4000 \
              -Davoidalloc \
              -Dvasp6 \
              -Duse_bse_te \
              -Dtbdyn \
              -Dqd_emulate \
              -Dfock_dblbuf \
              -D_OPENMP \
              -D_OPENACC \
              -DUSENCCL -DUSENCCLP2P

CPP         = ftn -Mpreprocess -Mfree -Mextend -E $(CPP_OPTIONS) $*$(FUFFIX)  > $*$(SUFFIX)

# N.B.: you might need to change the cuda-version here
#       to one that comes with your NVIDIA-HPC SDK
FC          = ftn -acc -gpu=cc60,cc70,cc80,cuda11.0 -mp
FCL         = ftn -acc -gpu=cc60,cc70,cc80,cuda11.0 -mp -c++libs

FREE        = -Mfree

FFLAGS      = -Mbackslash -Mlarge_arrays

OFLAG       = -fast

DEBUG       = -Mfree -O0 -traceback

OBJECTS     = fftmpiw.o fftmpi_map.o fftw3d.o fft3dlib.o

LLIBS       = -cudalib=cublas,cusolver,cufft,nccl -cuda

# Redefine the standard list of O1 and O2 objects
SOURCE_O1  := pade_fit.o
SOURCE_O2  := pead.o

# For what used to be vasp.5.lib
CPP_LIB     = $(CPP)
FC_LIB      = ftn
CC_LIB      = cc -w
CFLAGS_LIB  = -O
FFLAGS_LIB  = -O1 -Mfixed
FREE_LIB    = $(FREE)

OBJECTS_LIB = linpack_double.o

# For the parser library
CXX_PARS    = CC --no_warnings

##
## Customize as of this point! Of course you may change the preceding
## part of this file as well if you like, but it should rarely be
## necessary ...
##
# When compiling on the target machine itself , change this to the
# relevant target when cross-compiling for another architecture
VASP_TARGET_CPU ?= -tp host
FFLAGS     += $(VASP_TARGET_CPU)

# Specify your NV HPC-SDK installation (mandatory)
#... first try to set it automatically
NVROOT      =$(shell which nvfortran | awk -F /compilers/bin/nvfortran '{ print $$1 }')

# If the above fails, then NVROOT needs to be set manually
#NVHPC      ?= /opt/nvidia/hpc_sdk
#NVVERSION   = 21.11
#NVROOT      = $(NVHPC)/Linux_x86_64/$(NVVERSION)

## Improves performance when using NV HPC-SDK >=21.11 and CUDA >11.2
#OFLAG_IN   = -fast -Mwarperf
#SOURCE_IN  := nonlr.o

# Software emulation of quadruple precsion (mandatory)
QD         ?= $(NVROOT)/compilers/extras/qd
LLIBS      += -L$(QD)/lib -lqdmod -lqd
INCS       += -I$(QD)/include/qd

# BLAS (mandatory)
BLAS = -L$(NVROOT)/math_libs/lib64 -lnvblas -L$(MKLROOT)/lib/intel64 -Mmkl -lmkl_core

# LAPACK (mandatory)
LAPACK      =

# scaLAPACK (mandatory)
SCALAPACK   =

LLIBS      += $(SCALAPACK) $(LAPACK) $(BLAS)

# FFTW (mandatory)
FFTW_ROOT  ?= $(FFTW_ROOT)
LLIBS      += -L$(FFTW_ROOT)/lib -lfftw3 -lfftw3_omp
INCS       += -I$(FFTW_ROOT)/include

#4 Post by **alexey.tal** » Tue Apr 18, 2023 12:24 pm

This error appear because of the line FCL += mkl, which is incorrect, it should be -qmkl (or -mkl if you are using Intel Parallel Studio's MKL)

Code: Select all

He told me that scalapack from mkl doesn't work on AMD core architectures as mentioned in https://www.intel.com/content/www/us/en ... tines.html

I don't think that this is generally the case. But one can always compile scaLAPACK from scratch, if the MKL version doesn't work.

Without -Dscalapack some of VASP features will not work correctly, so I would not recommend switching off this flag.

My Community

Question about VASP 6.3.2 with NVHPC+mkl

Question about VASP 6.3.2 with NVHPC+mkl

Re: Question about VASP 6.3.2 with NVHPC+mkl

Re: Question about VASP 6.3.2 with NVHPC+mkl

Re: Question about VASP 6.3.2 with NVHPC+mkl