Problem of running VASP 4.6 on dual socket-six core nodes
Posted: Thu Feb 24, 2011 9:51 pm
Dear VASP administrator and users,
I compiled vasp 4.6.31 on our new machine which has dual socket-six core (12 cores) each node, using OPENMPI, intel ifort 12.0 compiler and MKL 10.0.23. The compilation is finished fine. However when I run the code in parallel, the calculations will proceed for a few minutes but then hang in there with no progress. I looked into OUTCAR and found that the program read in the input and then (seems) stops at 3dFFT transformation as in the following I copied from the OUTCAR:
......
......
k-point 1 : 0.00000.00000.0000 plane waves: 3511
k-point 2 : 0.20000.00000.0000 plane waves: 3569
k-point 3 : 0.40000.00000.0000 plane waves: 3598
k-point 4 : 0.40000.20000.0000 plane waves: 3609
k-point 5 : -.40000.20000.0000 plane waves: 3653
maximum and minimum number of plane-waves per node : 3653 3511
maximum number of plane-waves: 3653
maximal index in each direction:
IXMAX= 5 IYMAX= 4 IZMAX= 39
IXMIN= -5 IYMIN= -5 IZMIN=-39
WARNING: wrap around error must be expected set NGX to 22
NGY is ok and might be reduce to 20
NGZ is ok and might be reduce to 158
parallel 3dFFT wavefunction:
minimum data exchange during FFTs selected (reduces bandwidth)
(NO MORE OUTPUT after this in the OUTCAR)
I asked our administrator to do some tests and he found out that the program (with a simple test) "runs fine on up to 16 procs, on 18+ procs it deadlocks. Then it also runs fine at 32 procs. I looked at where it deadlocks, and, some processes are at an MPI_Allreduce statement and others are at MPI_Barrier. The deadlock
is in a routine which divides the domain for the 3D FFT transform." And it seems that the number of processors used should be powers of 2. So that means some processors are just not used when the job is running.
So my question is how to circumvent this problem? Is there anything we can play with in the Makefile or in the INCAR file?
Thanks in advance.
I compiled vasp 4.6.31 on our new machine which has dual socket-six core (12 cores) each node, using OPENMPI, intel ifort 12.0 compiler and MKL 10.0.23. The compilation is finished fine. However when I run the code in parallel, the calculations will proceed for a few minutes but then hang in there with no progress. I looked into OUTCAR and found that the program read in the input and then (seems) stops at 3dFFT transformation as in the following I copied from the OUTCAR:
......
......
k-point 1 : 0.00000.00000.0000 plane waves: 3511
k-point 2 : 0.20000.00000.0000 plane waves: 3569
k-point 3 : 0.40000.00000.0000 plane waves: 3598
k-point 4 : 0.40000.20000.0000 plane waves: 3609
k-point 5 : -.40000.20000.0000 plane waves: 3653
maximum and minimum number of plane-waves per node : 3653 3511
maximum number of plane-waves: 3653
maximal index in each direction:
IXMAX= 5 IYMAX= 4 IZMAX= 39
IXMIN= -5 IYMIN= -5 IZMIN=-39
WARNING: wrap around error must be expected set NGX to 22
NGY is ok and might be reduce to 20
NGZ is ok and might be reduce to 158
parallel 3dFFT wavefunction:
minimum data exchange during FFTs selected (reduces bandwidth)
(NO MORE OUTPUT after this in the OUTCAR)
I asked our administrator to do some tests and he found out that the program (with a simple test) "runs fine on up to 16 procs, on 18+ procs it deadlocks. Then it also runs fine at 32 procs. I looked at where it deadlocks, and, some processes are at an MPI_Allreduce statement and others are at MPI_Barrier. The deadlock
is in a routine which divides the domain for the 3D FFT transform." And it seems that the number of processors used should be powers of 2. So that means some processors are just not used when the job is running.
So my question is how to circumvent this problem? Is there anything we can play with in the Makefile or in the INCAR file?
Thanks in advance.