Page 1 of 2
VASP 6.3.0 compiles but fais some validation tests
Posted: Fri Feb 11, 2022 12:01 am
by chunsheng_wang
I am trying to compile VASP 6.3.0_0 on an HPC cluster, and although I can get the compilation to succeed, I am encountering issues (segfaults) in the validation tests.
The cluster nodes have dual Intel Ivy Bridge E5-2680v2 chips and 128 GB of RAM. Compiled with Intel Parallel Studio Xe 2020 Update 1 cluster edition, using the included MKL for BLAS, LaPACK, FFTW, ScaLAPACK and the included Intel MPI libraries.
The validation tests
NiOsLDAU=2_x
NiOsLDAU=2_x_RPR
NiOsLDAU=2_y
NiOsLDAU=2_y_RPR
NiOsLDAU=2_z
NiOsLDAU=2_z_RPR
SiC8_GW0R
Tl_x
Tl_x_RPR
Tl_y
Tl_y_RPR
Tl_z
Tl_z_RPR
are failing, I believe with segfaults.
For running the tests, I am using make test with
nthrds=4
nranks=2
mpi_flags="-np $nranks -ppn $nranks"
omp_flags="-genv OMP_NUM_THREADS=$nthrds -genv OMP_STACKSIZE=512m"
export VASP_TESTSUITE_EXE_STD="mpirun ${mpi_flags} ${omp_flags} ${GLUEVASP_STD}"
export VASP_TESTSUITE_EXE_GAM="mpirun ${mpi_flags} ${omp_flags} ${GLUEVASP_GAM}"
export VASP_TESTSUITE_EXE_NCL="mpirun ${mpi_flags} ${omp_flags} ${GLUEVASP_NCL}"
as suggested by the impi+omp.conf
where GLUEVASP_STD/GAM/NCL point to the vasp_std/gam/ncl executables in the build directory
Attached are tarballs with makefile.include, testsuite.log and the test/* directories for the failed tests (except for SiC8_GW0R which was too large to attach)
Any assistance you can offer with this would be appreciated
Re: VASP 6.3.0 compiles but fais some validation tests
Posted: Fri Feb 11, 2022 1:50 pm
by ferenc_karsai
I've checked this calculations with all of our compilers. We also continuously test the testsuite. I see no problems in our calculations, so most likely your toolchain has a problem.
Very often Scalapack and shared memory for MPI are sources of problems. In your compiling I didn't see shared memory so we can rule that out. But you used Scalapack. So please try to compile without Scalapack and see if the problem persists. For that please remove "-DscaLAPACK" from the "CPP_OPTIONS" in your makefile.include.
Please also compile with "-traceback -debug -g". It maybe gives useful information, since it prints out the line where the code crashes.
Re: VASP 6.3.0 compiles but fais some validation tests
Posted: Tue Feb 15, 2022 2:54 pm
by hszhao.cn@gmail.com
Please also compile with "-traceback -debug -g". It maybe gives useful information, since it prints out the line where the code crashes.
What do you mean by saying the following?
Code: Select all
compile with "-traceback -debug -g"
I checked the GNU Make options, and only can find the following most relevant options similar to your above-mentioned ones:
Code: Select all
--debug[=FLAGS]
Print debugging information in addition to normal processing. If the FLAGS are omitted, then the behavior is the
same as if -d was specified. FLAGS may be a for all debugging output (same as using -d), b for basic debugging, v
for more verbose basic debugging, i for showing implicit rules, j for details on invocation of commands, and m for
debugging while remaking makefiles. Use n to disable all previous debugging flags.
--trace
Information about the disposition of each target is printed (why the target is being rebuilt and what commands are
run to rebuild it).
Regards,
HZ
Re: VASP 6.3.0 compiles but fais some validation tests
Posted: Tue Feb 15, 2022 4:17 pm
by ferenc_karsai
These options are for the intel compiler. I've written them because I saw you compiled before with intel.
For GNU use the following:
-fbacktrace -g -debug
Re: VASP 6.3.0 compiles but fais some validation tests
Posted: Wed Feb 16, 2022 2:19 am
by hszhao.cn@gmail.com
Thank you for your clarification. Here, I will provide some further explanations for Intel compiler on this issue for others' reference.
For understanding the precise meaning of "-traceback -debug -g", see the following built-in help of ifort:
Code: Select all
$ ifort --help |grep -A3 traceback$
-[no]traceback
specify whether the compiler generates PC correlation data used to
display a symbolic traceback rather than a hexadecimal traceback at
runtime failure
$ ifort --help |grep -A5 -- '-debug \['
-debug [keyword]
Control the emission of debug information.
Valid [keyword] values:
none
Disables debug generation.
$ ifort --help |grep -A6 -- '-g\[level\]'
-g[level]
Produce symbolic debug information.
Valid [level] values:
0 - Disable generation of symbolic debug information.
1 - Emit minimal debug information for performing stack traces.
2 - Emit complete debug information. (default for -g)
3 - Emit extra information which may be useful for some tools.
So, "-traceback -debug -g" should mean the following directives:
- Specify the compiler generates PC correlation data used to display a symbolic traceback rather than a hexadecimal traceback at runtime failure.
- Disables debug generation.
- Emit complete debug information.
So, basically, your suggestion is to add the above option to the debug variable in
makefile.include, as shown below:
Also see some suggestions [here]
https://www.nas.nasa.gov/hecc/support/k ... ns_92.html[/url].
Regards,
HZ
Re: VASP 6.3.0 compiles but fais some validation tests
Posted: Wed Feb 16, 2022 8:31 am
by ferenc_karsai
DEBUG is not automatically used, better append it to FFLAGS.
Re: VASP 6.3.0 compiles but fais some validation tests
Posted: Wed Feb 16, 2022 9:46 am
by hszhao.cn@gmail.com
DEBUG is not automatically used, better append it to FFLAGS.
Thanks for your advice. Now, I inserted the following line in makefile.include, which is located after the initial value setting of FFLAGS:
Very often Scalapack and shared memory for MPI are sources of problems. In your compiling I didn't see shared memory so we can rule that out. But you used Scalapack. So please try to compile without Scalapack and see if the problem persists. For that please remove "-DscaLAPACK" from the "CPP_OPTIONS" in your makefile.include.
I'm still a little confused about your description above. More specifically, do you mean the following Makefile configuration modification?
1. If I use the
makefile.include.intel based Makefile, "-DscaLAPACK" should be preserved.
2. If I use the
makefile.include.intel_omp or
makefile.include.intel_ompi_mkl_omp based Makefiles, "-DscaLAPACK" should be removed.
Am I right? Any more hints will be highly appreciated.
Regards,
HZ
Re: VASP 6.3.0 compiles but fais some validation tests
Posted: Wed Feb 16, 2022 11:21 am
by ferenc_karsai
No, what I meant is for narrowing down the error compile without "-DscaLAPACK." That can be done with any compiler. If the code works without scaLAPACK, but not with, then we know the error is in your scaLAPACK setup.
Re: VASP 6.3.0 compiles but fais some validation tests
Posted: Wed Feb 16, 2022 11:23 am
by hszhao.cn@gmail.com
ferenc_karsai wrote: ↑Fri Feb 11, 2022 1:50 pm
I've checked this calculations with all of our compilers. We also continuously test the testsuite. I see no problems in our calculations, so most likely your toolchain has a problem.
Could you please share the full content of your
makefile.include?
Regards,
HZ
Re: VASP 6.3.0 compiles but fais some validation tests
Posted: Wed Feb 16, 2022 2:08 pm
by chunsheng_wang
Ferenc and VASP people: FYI, there are two people having this issue on this ticket. I am the creator of this ticket, and someone else (not directly working with me) has also posted. Due to the time needed for compile/test cycles and other commitments I only just now am replying to the initial post.
I have rebuilt VASP and rerun with the debugging flags and scaLAPACK disabled. I have also disabled hdf5 and wannier90 just to turn off as much extraneous stuff as possible.
Tests HEG_333_LW, SiC8_GW0R, and SiC_ACFDTR_T complain about the lack of scaLAPACK and are listed as failed, but I am assuming that is normal (as we turned off scaLAPACK).
Tests Tl_x, Tl_x_RPR, Tl_y, Tl_y_RPR, Tl_z, and Tl_z_RPR are segfaulting.
I have attached the makefile.include, testsuite.log, and test/Tl_* directories in attached tarball
vasptest.tar.gz
(I had a little trouble with the requested debug flags the first time around, so I put them all over the place in the current makefile.include just to make sure they took effect)
At this point, I believe the entire toolchain is within the Intel Parallel Studio Suite compiler + MKL (version 2020.1)
Any assistance you can provide regarding/resolving these issues with the validation tests will be appreciated. Thank you in advance.
Re: VASP 6.3.0 compiles but fais some validation tests
Posted: Thu Feb 17, 2022 10:21 am
by hszhao.cn@gmail.com
I think the culprit presumably related to the following setting in your
makefile.include:
All the failed tests mentioned by you have passed on my machine (Ubuntu 20.04.3 LTS with dual Core Intel Xeon E5-2699 v4). See the following for more details on the toolchain, makefile.include, and the testsuite.log file.
1. The tool chains are the recent versions of Intel oneAPI base and hpc toolkits:
Code: Select all
$ module purge
$ module load mpi/2021.4.0 mkl compiler
$ module list
Currently Loaded Modules:
1) mpi/2021.4.0 3) compiler-rt/2022.0.2 5) oclfpga/2022.0.2
2) tbb/2021.5.1 4) mkl/2022.0.2 6) compiler/2022.0.2
2. The content of the
makefile.include is as follows:
Code: Select all
$ egrep -v '^(#|$)' makefile.include.intel
CPP_OPTIONS = -DHOST=\"LinuxIFC\" \
-DMPI -DMPI_BLOCK=8000 -Duse_collective \
-DscaLAPACK \
-DCACHE_SIZE=4000 \
-Davoidalloc \
-Dvasp6 \
-Duse_bse_te \
-Dtbdyn \
-Dfock_dblbuf
CPP = fpp -f_com=no -free -w0 $*$(FUFFIX) $*$(SUFFIX) $(CPP_OPTIONS)
FC = mpiifort
FCL = mpiifort
FREE = -free -names lowercase
FFLAGS = -assume byterecl -w
OFLAG = -O2
OFLAG_IN = $(OFLAG)
DEBUG = -O0
OBJECTS = fftmpiw.o fftmpi_map.o fftw3d.o fft3dlib.o
OBJECTS_O1 += fftw3d.o fftmpi.o fftmpiw.o
OBJECTS_O2 += fft3dlib.o
CPP_LIB = $(CPP)
FC_LIB = $(FC)
CC_LIB = icc
CFLAGS_LIB = -O
FFLAGS_LIB = -O1
FREE_LIB = $(FREE)
OBJECTS_LIB = linpack_double.o
CXX_PARS = icpc
LLIBS = -lstdc++
FFLAGS += -march=core-avx2
FFLAGS += -traceback -debug -g
LLIBS += -L${MKLROOT}/lib/intel64 -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64 -liomp5 -lpthread -lm -ldl
FCL += -qmkl=parallel
INCS =-I$(MKLROOT)/include/fftw
Instead of using , the following option is used based on the suggestion
here:
Side remark: Base on my testing, the following Intel MPI Library doesn't work,
mpi/2021.5.0, i.e., mpi/2021.5.1
Regards,
HZ
Re: VASP 6.3.0 compiles but fais some validation tests
Posted: Fri Feb 18, 2022 7:53 pm
by chunsheng_wang
@hszhao.cn: Thank you. The +xHOST flag was indeed the issue. After replacing with the appropriate -march flag (are cluster is a bit too old to support AVX2:), the tests all pass. I am surprised that that is the cuplrit, I thought +xHOST just instructed the compiler to produce code to optimize/run on the processor being used for compilation, and I compiled on a system with the same processor as the test was run on, but the suggested modification worked. Thank you again for all your assistance.
Re: VASP 6.3.0 compiles but fais some validation tests
Posted: Fri Feb 18, 2022 10:28 pm
by ferenc_karsai
Hszhao, thank you very much for helping us find the problem in your compilations.
Re: VASP 6.3.0 compiles but fais some validation tests
Posted: Sat Feb 19, 2022 2:57 pm
by hszhao.cn@gmail.com
Some tricks for setting the value of
-march.
1. Obtain the arch name as follows:
Code: Select all
$ gcc -march=native -Q --help=target|grep -- '^[ ]*-march='
-march= broadwell
Then based on the intel official document
here, the following should be used:
2. If your arch/processor name is not listed in the intel official document
here, just use the following trick as commented
here:
I’ve confirmed that both of the above two settings can solve the problem discussed here.
Regards,
HZ
Re: VASP 6.3.0 compiles but fais some validation tests
Posted: Mon Feb 21, 2022 2:35 pm
by hszhao.cn@gmail.com
Using the following environment: Ubuntu 20.04.3 LTS installed on a dual Intel Xeon E5-2699 v4 CPUs machine, I recompiled vasp.6.3.0 using the
-xHost option, and then validated all selected tests in the fast category successfully on the same machine. The following components of the Intel oneAPI BASE and HPC toolkits are used:
Code: Select all
$ module load compiler mkl mpi/2021.4.0
$ module list
Currently Loaded Modules:
1) lmod 3) compiler-rt/2022.0.2 5) compiler/2022.0.2 7) mpi/2021.4.0
2) tbb/2021.5.1 4) oclfpga/2022.0.2 6) mkl/2022.0.2
Attached are the related
makefile.include and
testsuite.log files. So, I conclude that if you compile and run vasp on the exactly same CPU architectures,
-xHost should work, otherwise, use appropriate
-march compiler option for cross-compilation. You can see related discussions
here.
Regards,
HZ