I am having difficulty compiling the OpenACC GPU port of VASP 6.4.0 with the Intel MKL and OpenMP parallelism. I have been following the official instructions.
I compiled succesfully with OpenACC+MPI and can use 4 MPI processes with 4 GPUs. The problem arises with the OpenMP threading that should be enabled in order to use more CPU resources on the local node.
The documentation mentions that 'libiomp5.so' should be linked, and references the relevant Makefile.include; however, this is not what happens when I try it. The '-mp' option links the internal NVOMP library and not the Intel OpenMP 'libiomp5.so'. The correct 'libmkl_intel_thread.so' library is linked in but no multi-threading happens, which is consistent behavior.
If I forcefully link 'libiomp5.so' explicitly, the binary segfaults immediately when executed. I have seen some reports that the NVOMP and Intel OpenMP libraries are incompatible, so this also seems to be consistent behavior.
I am sure that there is some easy way to get the the OpenMP threading working between the NVHPC toolchain and the Intel MKL, but I am stumped. Any advice on this is greatly appreciated! Below is my Makefile.include for reference, but it is nothing special.
Code: Select all
# Default precompiler options
CPP_OPTIONS = -DHOST=\"LinuxNV\" \
-DMPI -DMPI_INPLACE -DMPI_BLOCK=8000 -Duse_collective \
-DscaLAPACK \
-DCACHE_SIZE=4000 \
-Davoidalloc \
-Dvasp6 \
-Duse_bse_te \
-Dtbdyn \
-Dqd_emulate \
-Dfock_dblbuf \
-D_OPENMP \
-D_OPENACC \
-DUSENCCL -DUSENCCLP2P
CPP = nvfortran -Mpreprocess -Mfree -Mextend -E $(CPP_OPTIONS) $*$(FUFFIX) > $*$(SUFFIX)
# N.B.: you might need to change the cuda-version here
# to one that comes with your NVIDIA-HPC SDK
FC = mpif90 -acc -gpu=cc70,cc80,cuda11.7,cuda12.0 -mp
FCL = mpif90 -acc -gpu=cc70,cc80,cuda11.7,cuda12.0 -mp -c++libs
FREE = -Mfree
FFLAGS = -Mbackslash -Mlarge_arrays
OFLAG = -fast
DEBUG = -Mfree -O0 -traceback
OBJECTS = fftmpiw.o fftmpi_map.o fftw3d.o fft3dlib.o
LLIBS = -cudalib=cublas,cusolver,cufft,nccl -cuda
# Redefine the standard list of O1 and O2 objects
SOURCE_O1 := pade_fit.o minimax_dependence.o
SOURCE_O2 := pead.o
# For what used to be vasp.5.lib
CPP_LIB = $(CPP)
FC_LIB = nvfortran
CC_LIB = nvc -w
CFLAGS_LIB = -O
FFLAGS_LIB = -O1 -Mfixed
FREE_LIB = $(FREE)
OBJECTS_LIB = linpack_double.o
# For the parser library
CXX_PARS = nvc++ --no_warnings
# When compiling on the target machine itself , change this to the
# relevant target when cross-compiling for another architecture
VASP_TARGET_CPU ?= -tp host
FFLAGS += $(VASP_TARGET_CPU)
# Specify your NV HPC-SDK installation (mandatory)
#... first try to set it automatically
NVROOT =$(shell which nvfortran | awk -F /compilers/bin/nvfortran '{ print $$1 }')
## Improves performance when using NV HPC-SDK >=21.11 and CUDA >11.2
#OFLAG_IN = -fast -Mwarperf
#SOURCE_IN := nonlr.o
# Software emulation of quadruple precsion (mandatory)
QD ?= $(NVROOT)/compilers/extras/qd
LLIBS += -L$(QD)/lib -lqdmod -lqd
INCS += -I$(QD)/include/qd
# Intel MKL for FFTW, BLAS, LAPACK, and scaLAPACK
MKLROOT ?= /opt/intel/oneapi/mkl/2021.2.0
LLIBS_MKL = -Mmkl -L${MKLROOT}/lib/intel64 -lmkl_scalapack_lp64 -lmkl_blacs_openmpi_lp64
INCS += -I$(MKLROOT)/include/fftw
LLIBS += $(LLIBS_MKL)