VASP 6.4.0 with OpenACC+MPI+OpenMP+MKL

Message

seananderson · #1 Post by **seananderson** » Fri May 05, 2023 4:55 pm

Hi all,

I am having difficulty compiling the OpenACC GPU port of VASP 6.4.0 with the Intel MKL and OpenMP parallelism. I have been following the official instructions.

I compiled succesfully with OpenACC+MPI and can use 4 MPI processes with 4 GPUs. The problem arises with the OpenMP threading that should be enabled in order to use more CPU resources on the local node.

The documentation mentions that 'libiomp5.so' should be linked, and references the relevant Makefile.include; however, this is not what happens when I try it. The '-mp' option links the internal NVOMP library and not the Intel OpenMP 'libiomp5.so'. The correct 'libmkl_intel_thread.so' library is linked in but no multi-threading happens, which is consistent behavior.

If I forcefully link 'libiomp5.so' explicitly, the binary segfaults immediately when executed. I have seen some reports that the NVOMP and Intel OpenMP libraries are incompatible, so this also seems to be consistent behavior.

I am sure that there is some easy way to get the the OpenMP threading working between the NVHPC toolchain and the Intel MKL, but I am stumped. Any advice on this is greatly appreciated! Below is my Makefile.include for reference, but it is nothing special.

Code: Select all

# Default precompiler options
CPP_OPTIONS = -DHOST=\"LinuxNV\" \
              -DMPI -DMPI_INPLACE -DMPI_BLOCK=8000 -Duse_collective \
              -DscaLAPACK \
              -DCACHE_SIZE=4000 \
              -Davoidalloc \
              -Dvasp6 \
              -Duse_bse_te \
              -Dtbdyn \
              -Dqd_emulate \
              -Dfock_dblbuf \
              -D_OPENMP \
              -D_OPENACC \
              -DUSENCCL -DUSENCCLP2P

CPP         = nvfortran -Mpreprocess -Mfree -Mextend -E $(CPP_OPTIONS) $*$(FUFFIX)  > $*$(SUFFIX)

# N.B.: you might need to change the cuda-version here
#       to one that comes with your NVIDIA-HPC SDK
FC          = mpif90 -acc -gpu=cc70,cc80,cuda11.7,cuda12.0 -mp
FCL         = mpif90 -acc -gpu=cc70,cc80,cuda11.7,cuda12.0 -mp -c++libs

FREE        = -Mfree

FFLAGS      = -Mbackslash -Mlarge_arrays

OFLAG       = -fast

DEBUG       = -Mfree -O0 -traceback

OBJECTS     = fftmpiw.o fftmpi_map.o fftw3d.o fft3dlib.o

LLIBS       = -cudalib=cublas,cusolver,cufft,nccl -cuda

# Redefine the standard list of O1 and O2 objects
SOURCE_O1  := pade_fit.o minimax_dependence.o
SOURCE_O2  := pead.o

# For what used to be vasp.5.lib
CPP_LIB     = $(CPP)
FC_LIB      = nvfortran
CC_LIB      = nvc -w
CFLAGS_LIB  = -O
FFLAGS_LIB  = -O1 -Mfixed
FREE_LIB    = $(FREE)

OBJECTS_LIB = linpack_double.o

# For the parser library
CXX_PARS    = nvc++ --no_warnings


# When compiling on the target machine itself , change this to the
# relevant target when cross-compiling for another architecture
VASP_TARGET_CPU ?= -tp host
FFLAGS     += $(VASP_TARGET_CPU)

# Specify your NV HPC-SDK installation (mandatory)
#... first try to set it automatically
NVROOT      =$(shell which nvfortran | awk -F /compilers/bin/nvfortran '{ print $$1 }')

## Improves performance when using NV HPC-SDK >=21.11 and CUDA >11.2
#OFLAG_IN   = -fast -Mwarperf
#SOURCE_IN  := nonlr.o

# Software emulation of quadruple precsion (mandatory)
QD         ?= $(NVROOT)/compilers/extras/qd
LLIBS      += -L$(QD)/lib -lqdmod -lqd
INCS       += -I$(QD)/include/qd

# Intel MKL for FFTW, BLAS, LAPACK, and scaLAPACK
MKLROOT    ?= /opt/intel/oneapi/mkl/2021.2.0
LLIBS_MKL   = -Mmkl -L${MKLROOT}/lib/intel64 -lmkl_scalapack_lp64 -lmkl_blacs_openmpi_lp64
INCS       += -I$(MKLROOT)/include/fftw

LLIBS      += $(LLIBS_MKL)

#2 Post by **merzuk.kaltak** » Mon May 08, 2023 8:04 am

Dear Sean Anderson,

could you please share with us the used compiler and mkl version?
I also recommend to update to vasp 6.4.1.

seananderson · #3 Post by **seananderson** » Tue Jul 18, 2023 2:56 pm

Dear Merzuk,

Sorry for the very delayed response, I was on paternity leave.

I am using the GCC 10.2.0 compiler, Intel oneAPI 2021.2 MKL, and the NVHPC 23.3 toolkit. I encountered the exact same issues with VASP 6.4.1, although the CPU-only build works perfectly.

Many thanks!

seananderson · #4 Post by **seananderson** » Thu Jul 20, 2023 11:59 am

It seems that my problem was not understanding the MPI options required to parallelize across GPUs and OpenMP threads. I was able to get the desired behavior by looking at the options in 'testsuite/ompi+omp.conf', so I think everything is working as expected.

However, I am curious about the use of 'libiomp' vs. others, as indicated in this particular part of the official documentation. In my builds, 'libiomp' is never linked automatically, and the binaries segfault if I force it to be linked. Will the threaded portions of the MKL run correctly without 'libiomp' (in particular, 'libmkl_intel_thread')? My concern is that the VASP code itself will be running with the OpenMP threads, but not the MKL portions.

Another question: has Intel MPI been tested with this GPU+NVHPC configuration? I wonder if it has the same level of CUDA-awareness as Open MPI.

Thanks!

My Community

VASP 6.4.0 with OpenACC+MPI+OpenMP+MKL

VASP 6.4.0 with OpenACC+MPI+OpenMP+MKL

Re: VASP 6.4.0 with OpenACC+MPI+OpenMP+MKL

Re: VASP 6.4.0 with OpenACC+MPI+OpenMP+MKL

Re: VASP 6.4.0 with OpenACC+MPI+OpenMP+MKL