We’ve successfully compiled VASP 6.1.1 with Intel 2020 (compilers, MKL, MPI) and CUDA 11.
The CPU version passes all tests, except SiC_TDHSE
The GPU version crashes. The job has full access to both GPU devices.
This is the first test from the testsuite:
Code: Select all
bulk_GaAs_ACFDT_RPR step DFT
entering run_vasp
Using device 0 (rank 1, local rank 1, local size 4) : Tesla P100-PCIE-16GB
Using device 1 (rank 3, local rank 3, local size 4) : Tesla P100-PCIE-16GB
Using device 1 (rank 2, local rank 2, local size 4) : Tesla P100-PCIE-16GB
Using device 0 (rank 0, local rank 0, local size 4) : Tesla P100-PCIE-16GB
running on 4 total cores
distrk: each k-point on 2 cores, 2 groups
distr: one band on 1 cores, 2 groups
using from now: INCAR
[…]
POSCAR found : 2 types and 2 ions
[…]
LDA part: xc-table for Pade appr. of Perdew
CUDA Error in cuda_mem.cu, line 44: all CUDA-capable devices are busy or unavailable
Failed to register pinned memory!
[pa2:30823:0:30823] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
==== backtrace (tid: 30823) ====
0 0x000000000004cb95 ucs_debug_print_backtrace() ???:0
1 0x00000000019bf1fb __cuda_error() /dev/shm/vasp.6.1.1/build/gpu/CUDA/cuda_globals.h:59
2 0x00000000019bf1fb nvpinnedmalloc_C() /dev/shm/vasp.6.1.1/build/gpu/CUDA/cuda_mem.cu:43
3 0x00000000005edcb2 wave_mp_gen_layout_() ???:0
4 0x0000000001821cc6 MAIN__() ???:0
5 0x000000000040cfd2 main() ???:0
6 0x0000000000022555 __libc_start_main() ???:0
7 0x000000000040cee9 _start() ???:0
=================================
creating 32 CUDA streams...
creating 32 CUDA streams...
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
vasp_gpu 00000000019FBB1A Unknown Unknown Unknown
libpthread-2.17.s 00002B1369625630 Unknown Unknown Unknown
vasp_gpu 00000000019BF1FB Unknown Unknown Unknown
vasp_gpu 00000000005EDCB2 Unknown Unknown Unknown
vasp_gpu 0000000001821CC6 Unknown Unknown Unknown
vasp_gpu 000000000040CFD2 Unknown Unknown Unknown
libc-2.17.so 00002B1369B56555 __libc_start_main Unknown Unknown
vasp_gpu 000000000040CEE9 Unknown Unknown Unknown