Possible Bugs in GPU-Vasp 6.4.1 using SOC and IVDW 20/21/202
Posted: Wed Oct 18, 2023 6:51 pm
I am getting different dispersion energies from CPU and GPU, for the same version of VASP (6.4.1) and the same system.
The statistics are as attached in the screenshot: I tested IVDW 10, 11, 12, 20, 21, and 202, with LSORBIT = .TRUE. (spin-orbit coupling).
For DFT-Dx corrections, the GPU and CPU runs are giving the same answers. However, for Tkatchenko-Scheffler and many-body corrections, the GPU results deviate from the CPU ones, or even crashed. My colleague tested version 6.4.0 on another GPU machine, and the number agreed with CPU (IVDW = 20). To the best of my observation, this issue does not appear if LSORBIT = .FALSE.
The four input files for my test system are attached here, as well as the makefile.include file that I used to compile the code on GPU. Could the VASP team, or anyone else who is interested, try running them and see if the same problem occurs? Thanks.
The statistics are as attached in the screenshot: I tested IVDW 10, 11, 12, 20, 21, and 202, with LSORBIT = .TRUE. (spin-orbit coupling).
For DFT-Dx corrections, the GPU and CPU runs are giving the same answers. However, for Tkatchenko-Scheffler and many-body corrections, the GPU results deviate from the CPU ones, or even crashed. My colleague tested version 6.4.0 on another GPU machine, and the number agreed with CPU (IVDW = 20). To the best of my observation, this issue does not appear if LSORBIT = .FALSE.
The four input files for my test system are attached here, as well as the makefile.include file that I used to compile the code on GPU. Could the VASP team, or anyone else who is interested, try running them and see if the same problem occurs? Thanks.
Code: Select all
# Default precompiler options
CPP_OPTIONS = -DHOST=\"LinuxNV\" \
-DMPI -DMPI_INPLACE -DMPI_BLOCK=8000 -Duse_collective \
-DscaLAPACK \
-DCACHE_SIZE=4000 \
-Davoidalloc \
-Dvasp6 \
-Duse_bse_te \
-Dtbdyn \
-Dqd_emulate \
-Dfock_dblbuf \
-D_OPENMP \
-D_OPENACC \
-DUSENCCL -DUSENCCLP2P
CPP = nvfortran -Mpreprocess -Mfree -Mextend -E $(CPP_OPTIONS) $*$(FUFFIX) > $*$(SUFFIX)
# N.B.: you might need to change the cuda-version here
# to one that comes with your NVIDIA-HPC SDK
FC = mpif90 -acc -gpu=cc60,cc70,cc80,cuda12.2 -mp
FCL = mpif90 -acc -gpu=cc60,cc70,cc80,cuda12.2 -mp -c++libs
FREE = -Mfree
FFLAGS = -Mbackslash -Mlarge_arrays
OFLAG = -fast
DEBUG = -Mfree -O0 -traceback
OBJECTS = fftmpiw.o fftmpi_map.o fftw3d.o fft3dlib.o
LLIBS = -cudalib=cublas,cusolver,cufft,nccl -cuda
# Redefine the standard list of O1 and O2 objects
SOURCE_O1 := pade_fit.o minimax_dependence.o
SOURCE_O2 := pead.o
# For what used to be vasp.5.lib
CPP_LIB = $(CPP)
FC_LIB = nvfortran
CC_LIB = nvc -w
CFLAGS_LIB = -O
FFLAGS_LIB = -O1 -Mfixed
FREE_LIB = $(FREE)
OBJECTS_LIB = linpack_double.o
# For the parser library
CXX_PARS = nvc++ --no_warnings
##
## Customize as of this point! Of course you may change the preceding
## part of this file as well if you like, but it should rarely be
## necessary ...
##
# When compiling on the target machine itself , change this to the
# relevant target when cross-compiling for another architecture
VASP_TARGET_CPU ?= -tp host
FFLAGS += $(VASP_TARGET_CPU)
# Specify your NV HPC-SDK installation (mandatory)
#... first try to set it automatically
NVROOT = /home/libraries/nvhpc_2023_237_Linux_x86_64_cuda_12.2/hpc_sdk/Linux_x86_64/23.7
# If the above fails, then NVROOT needs to be set manually
#NVHPC ?= /opt/nvidia/hpc_sdk
#NVVERSION = 21.11
#NVROOT = $(NVHPC)/Linux_x86_64/$(NVVERSION)
## Improves performance when using NV HPC-SDK >=21.11 and CUDA >11.2
OFLAG_IN = -fast -Mwarperf
SOURCE_IN := nonlr.o
# Software emulation of quadruple precsion (mandatory)
QD ?= $(NVROOT)/compilers/extras/qd
LLIBS += -L$(QD)/lib -lqdmod -lqd
INCS += -I$(QD)/include/qd
# BLAS (mandatory)
BLAS = -lblas
# LAPACK (mandatory)
LAPACK = -llapack
# scaLAPACK (mandatory)
SCALAPACK = -Mscalapack
LLIBS += $(SCALAPACK) $(LAPACK) $(BLAS)
# FFTW (mandatory)
FFTW_ROOT ?= /home/libraries/fftw/gpu/fftw_install
LLIBS += -L$(FFTW_ROOT)/lib -lfftw3 -lfftw3_omp
INCS += -I$(FFTW_ROOT)/include
# HDF5-support (optional but strongly recommended)
CPP_OPTIONS+= -DVASP_HDF5
HDF5_ROOT ?= /home/libraries/hdf5/gpu/hdf5_install
LLIBS += -L$(HDF5_ROOT)/lib -lhdf5_fortran
INCS += -I$(HDF5_ROOT)/include
# For the VASP-2-Wannier90 interface (optional)
CPP_OPTIONS += -DVASP2WANNIER90
WANNIER90_ROOT ?= /home/ibraries/wannier90-3.1.0/gpu
LLIBS += -L$(WANNIER90_ROOT)/lib -lwannier