Page 1 of 1

VASP 6.4.1 Dev Bug error

Posted: Thu Dec 07, 2023 11:48 pm
by nathan_keilbart
Hello everyone,

I have recently compiled VASP 6.4.1 on Lassen at LLNL which is an IBM Power9 server with V100 GPUs from Nvidia. I used NVHPC compilers which gave very little issues when following the include files which I have pasted below for reference. For the most part it seems to be performing well and giving the same values as a version we have running on an Intel cpu server. Recently while running simulations on Pu2O3 surface slab I encountered the following bug report.

Code: Select all

 -----------------------------------------------------------------------------
|                     _     ____    _    _    _____     _                     |
|                    | |   |  _ \  | |  | |  / ____|   | |                    |
|                    | |   | |_) | | |  | | | |  __    | |                    |
|                    |_|   |  _ <  | |  | | | | |_ |   |_|                    |
|                     _    | |_) | | |__| | | |__| |    _                     |
|                    (_)   |____/   \____/   \_____|   (_)                    |
|                                                                             |
|     internal error in: rot.F  at line: 803                                  |
|                                                                             |
|     EDWAV: internal error, the gradient is not orthogonal 5 3 -1.79e-4      |
|                                                                             |
|     If you are not a developer, you should not encounter this problem.      |
|     Please submit a bug report.                                             |
|                                                                             |
 -----------------------------------------------------------------------------
I imagine that since I don't get this on my Intel build that it has to be either a choice of compiler issue or perhaps how the GPU is coded. I have pasted the POSCAR, KPOINTS, and INCAR information below as well. Please let me know any other information I can provide to assist with this debugging.

# Precompiler options
CPP_OPTIONS= -DHOST=\"LinuxPGI\" \
-DMPI -DMPI_BLOCK=8000 -DMPI_INPLACE -Duse_collective \
-DscaLAPACK \
-DLAPACK36 \
-DCACHE_SIZE=4000 \
-Davoidalloc \
-Dvasp6 \
-Duse_bse_te \
-Dtbdyn \
-Dfock_dblbuf \
-Dqd_emulate \
-D_OPENACC \
-D_OPENMP \
-DUSENCCL \
-DUSENCCLP2P \
#-DPROFILING
# -DnoQuadPrecision \

# Define location of libraries
OLCF_NETLIB_LAPACK_ROOT ?= /usr/tcetmp/packages/lapack/lapack-3.9.0-gcc-7.3.1
OLCF_NETLIB_SCALAPACK_ROOT ?= /usr/tcetmp/packages/lapack/lapack-3.9.0-gcc-7.3.1
OLCF_ESSL_ROOT ?= /usr/tcetmp/packages/essl/essl-6.3.0.2
OLCF_CUDA_ROOT ?= /usr/tce/packages/cuda/cuda-11.7.0
NCCL ?= /usr/tce/packages/nvhpc/nvhpc-22.5/Linux_ppc64le/22.5/comm_libs/11.7/nccl/lib
QD ?= /usr/tce/packages/nvhpc/nvhpc-22.5/Linux_ppc64le/22.5/compilers/extras/qd
# Tuned version of FFTW may be available at /usr/tcetmp/packages/fftw/fftw-3.3.9-gcc-4.9.3
OLCF_FFTW_ROOT=/usr/tcetmp/packages/fftw/fftw-3.3.9-gcc-4.9.3

# Add library paths to rpath so they don't need to be added to LD_LIBRARY_PATH at runtime
# To select the library at runtime, omit them here and then add to LD_LIBRARY_PATH
LLIBS += -lm
LLIBS += -Wl,-rpath,${OLCF_NETLIB_LAPACK_ROOT}/lib:${OLCF_ESSL_ROOT}/lib64:${NCCL}:${QD}/lib:${OLCF_FFTW_ROOT}/lib

CPP = nvfortran -g -Mpreprocess -Mfree -Mextend -E $(CPP_OPTIONS) $*$(FUFFIX) > $*$(SUFFIX)

FC = mpif90 -g -acc -mp=gpu -ta=tesla:cc70 -Mcuda
FCL = mpif90 -g -acc -mp=gpu -ta=tesla:cc70 -pgc++libs -Mcuda -lgfortran

FREE = -Mfree -Mx,231,0x1

FFLAGS = -Mnoupcase -Mbackslash -Mlarge_arrays

OFLAG = -fast

DEBUG = -Mfree -O0 -traceback

LIBDIR =
BLAS =
ESSLLIB = -lesslsmpcuda # -lessl, -lesslsmp, or -lesslsmpcuda
LAPACK = -L$(OLCF_ESSL_ROOT)/lib64 ${ESSLLIB} -L$(OLCF_NETLIB_LAPACK_ROOT)/lib -llapack
BLACS =
SCALAPACK = -L$(OLCF_NETLIB_SCALAPACK_ROOT)/lib -lscalapack

CUDA = -Mcudalib=cublas -Mcudalib=cufft -Mcudalib=cusolver -Mcuda

LLIBS += $(SCALAPACK) $(LAPACK) $(BLAS) $(BLAS)
LLIBS += -L$(OLCF_CUDA_ROOT)/lib64 -lcuda -lcusolver -lcufft

LLIBS += -L$(NCCL) -lnccl

# Software emulation of quadruple precsion
LLIBS += -L$(QD)/lib -lqdmod -lqd
INCS += -I$(QD)/include/qd

# Use the FFTs from fftw

FFTW ?= $(OLCF_FFTW_ROOT)
LLIBS += -L$(FFTW)/lib64 -lfftw3 -lfftw3_omp
INCS += -I$(FFTW)/include

OBJECTS = fftmpiw.o fftmpi_map.o fftw3d.o fft3dlib.o

# Redefine the standard list of O1 and O2 objects
SOURCE_O1 := pade_fit.o
SOURCE_O2 := pead.o

# Workaround a bug in PGI compiler up to and including version 18.10
OFLAG_IN = -fast
SOURCE_IN := xcspin.o

# For what used to be vasp.5.lib
CPP_LIB = $(CPP)
FC_LIB = nvfortran
CC_LIB = nvc
CFLAGS_LIB = -g -O
FFLAGS_LIB = -g -O1 -Mfixed
FREE_LIB = $(FREE)

OBJECTS_LIB= linpack_double.o getshmem.o

# For the parser library
CXX_PARS = nvc++ --no_warnings

# Normally no need to change this
SRCDIR = ../../src
BINDIR = ../../bin


POSCAR
# Compound: O84Pu48. Old comment: O84Pu48
1.0000000000
11.0461949100 0.0000000000 0.0000000000
0.0000000000 11.0461949100 0.0000000000
0.0000000000 0.0000000000 36.9992383282
O Pu
84 48
Direct
0.8590822800 0.3722145500 0.4311722116
0.3590822800 0.6277854500 0.4311722116
0.1409177200 0.1277854500 0.4311722116
0.6409177200 0.8722145500 0.4311722116
0.1019951800 0.1409177200 0.3504976906
0.8980048200 0.8590822800 0.2741964990
0.8980048200 0.3590822800 0.3504976906
0.1019951800 0.6409177200 0.2741964990
0.1019951800 0.8590822800 0.4234724761
0.8980048200 0.6409177200 0.4234724761
0.3722145500 0.8980048200 0.4195518112
0.6277854500 0.8980048200 0.2702758341
0.3722145500 0.1019951800 0.3544183555
0.8722145500 0.1019951800 0.4195518112
0.8722145500 0.8980048200 0.3544183555
0.1277854500 0.1019951800 0.2702758341
0.6409177200 0.1277854500 0.3427979551
0.3590822800 0.8722145500 0.2818962345
0.8590822800 0.1277854500 0.2818962345
0.1409177200 0.8722145500 0.3427979551
0.3590822800 0.3722145500 0.3427979551
0.6409177200 0.6277854500 0.2818962345
0.1409177200 0.3722145500 0.2818962345
0.8590822800 0.6277854500 0.3427979551
0.3980048200 0.3590822800 0.4234724761
0.6019951800 0.1409177200 0.4234724761
0.6019951800 0.3590822800 0.2741964990
0.3980048200 0.6409177200 0.3504976906
0.3980048200 0.1409177200 0.2741964990
0.6019951800 0.8590822800 0.3504976906
0.1277854500 0.6019951800 0.3544183555
0.8722145500 0.3980048200 0.2702758341
0.1277854500 0.3980048200 0.4195518112
0.3722145500 0.6019951800 0.2702758341
0.6277854500 0.3980048200 0.3544183555
0.6277854500 0.6019951800 0.4195518112
0.1409177200 0.6277854500 0.4920739323
0.8590822800 0.3722145500 0.7297241659
0.3590822800 0.6277854500 0.7297241659
0.6409177200 0.3722145500 0.4920739323
0.8590822800 0.8722145500 0.4920739323
0.1409177200 0.1277854500 0.7297241659
0.6409177200 0.8722145500 0.7297241659
0.3590822800 0.1277854500 0.4920739323
0.1019951800 0.1409177200 0.6490496449
0.8980048200 0.8590822800 0.5727484533
0.8980048200 0.3590822800 0.6490496449
0.1019951800 0.6409177200 0.5727484533
0.1019951800 0.8590822800 0.7220244304
0.8980048200 0.1409177200 0.4997736678
0.8980048200 0.6409177200 0.7220244304
0.1019951800 0.3590822800 0.4997736678
0.6277854500 0.1019951800 0.5036943326
0.3722145500 0.8980048200 0.7181037655
0.6277854500 0.8980048200 0.5688277884
0.3722145500 0.1019951800 0.6529703098
0.8722145500 0.1019951800 0.7181037655
0.1277854500 0.8980048200 0.5036943326
0.8722145500 0.8980048200 0.6529703098
0.1277854500 0.1019951800 0.5688277884
0.6409177200 0.1277854500 0.6413499094
0.3590822800 0.8722145500 0.5804481888
0.8590822800 0.1277854500 0.5804481888
0.1409177200 0.8722145500 0.6413499094
0.3590822800 0.3722145500 0.6413499094
0.6409177200 0.6277854500 0.5804481888
0.1409177200 0.3722145500 0.5804481888
0.8590822800 0.6277854500 0.6413499094
0.6019951800 0.6409177200 0.4997736678
0.3980048200 0.3590822800 0.7220244304
0.3980048200 0.8590822800 0.4997736678
0.6019951800 0.1409177200 0.7220244304
0.6019951800 0.3590822800 0.5727484533
0.3980048200 0.6409177200 0.6490496449
0.3980048200 0.1409177200 0.5727484533
0.6019951800 0.8590822800 0.6490496449
0.1277854500 0.6019951800 0.6529703098
0.8722145500 0.3980048200 0.5688277884
0.1277854500 0.3980048200 0.7181037655
0.8722145500 0.6019951800 0.5036943326
0.3722145500 0.6019951800 0.5688277884
0.6277854500 0.3980048200 0.6529703098
0.3722145500 0.3980048200 0.5036943326
0.6277854500 0.6019951800 0.7181037655
0.0000000000 0.7500000000 0.3787740985
0.0000000000 0.2500000000 0.3951960682
0.7224973000 0.0000000000 0.3869850834
0.2224973000 0.0000000000 0.3869850834
0.7500000000 0.7775027000 0.3123470948
0.2500000000 0.2224973000 0.3123470948
0.2500000000 0.7224973000 0.3123470948
0.7500000000 0.2775027000 0.3123470948
0.5000000000 0.7500000000 0.3951960682
0.5000000000 0.2500000000 0.3787740985
0.7775027000 0.5000000000 0.3869850834
0.2775027000 0.5000000000 0.3869850834
0.0000000000 0.0000000000 0.3123470948
0.0000000000 0.5000000000 0.3123470948
0.5000000000 0.0000000000 0.3123470948
0.5000000000 0.5000000000 0.3123470948
0.2500000000 0.2775027000 0.4616230719
0.7500000000 0.7224973000 0.4616230719
0.7500000000 0.2224973000 0.4616230719
0.2500000000 0.7775027000 0.4616230719
0.0000000000 0.2500000000 0.5444720453
0.0000000000 0.7500000000 0.6773260528
0.0000000000 0.7500000000 0.5280500757
0.0000000000 0.2500000000 0.6937480225
0.2775027000 0.0000000000 0.5362610605
0.7224973000 0.0000000000 0.6855370377
0.2224973000 0.0000000000 0.6855370377
0.7775027000 0.0000000000 0.5362610605
0.7500000000 0.7775027000 0.6108990491
0.2500000000 0.2224973000 0.6108990491
0.2500000000 0.7224973000 0.6108990491
0.7500000000 0.2775027000 0.6108990491
0.5000000000 0.7500000000 0.6937480225
0.5000000000 0.2500000000 0.5280500757
0.5000000000 0.2500000000 0.6773260528
0.5000000000 0.7500000000 0.5444720453
0.7775027000 0.5000000000 0.6855370377
0.2224973000 0.5000000000 0.5362610605
0.7224973000 0.5000000000 0.5362610605
0.2775027000 0.5000000000 0.6855370377
0.0000000000 0.0000000000 0.4616230719
0.5000000000 0.0000000000 0.4616230719
0.0000000000 0.5000000000 0.4616230719
0.5000000000 0.5000000000 0.4616230719
0.0000000000 0.0000000000 0.6108990491
0.0000000000 0.5000000000 0.6108990491
0.5000000000 0.0000000000 0.6108990491
0.5000000000 0.5000000000 0.6108990491

INCAR
ALGO = Conjugate
EDIFF = 0.0001
EDIFFG = -0.01
ENCUT = 500
GGA = PS
IBRION = 2
ISIF = 2
ISMEAR = 0
ISYM = 0
KPAR = 8
LORBIT = 11
NCORE = 14
NELM = 500
NSW = 100
PREC = ACCURATE
SIGMA = 0.1

KPOINTS
# No comment
0
Gamma
3 3 1
0.000000000 0.000000000 0.000000000

Re: VASP 6.4.1 Dev Bug error

Posted: Mon Dec 11, 2023 8:21 am
by jonathan_lahnsteiner2
Dear nathan_keilbart,

This crash is related to the EDWAV algorithm. Another user reports a similar issue.
https://www.vasp.at/forum/viewtopic.php?t=19276
It is not clear yet where the problem originates.
Could you please supply us with the OUTCAR file of your calculation.
Like this, we will be able to investigate the problem further.

All the best Jonathan