VASP 5.2 crashes on NPAR values -ne 4 ?
Posted: Fri Jul 09, 2010 10:20 am
Hi, first post here...
I've recently successfully compiled VASP 5.2 on our new linux cluster using the (much appreciated) help found in this forum.
(em64t, MKL , openMPI everything updated to the latest version)
I seem to be facing a strange problem, though. When I try to perform the medium benchmark with NPAR =4, I get decent results, but for a crash on a test of two nodes only (16 CPUs). Whenever I alter NPAR (e.g. NPAR=1,2 or every run crashes or hangs, mostly with the following output (shown here specifically compiled with traceback)
My programming skills are mediocre, and my grasp of MPI is appalling at best, but even so I can't help to think that something is wrong with our openMPI configuration, or that somehow VASP receives NPAR=0 (which would make it a bug report). Other HPC software runs fine.
What would be your take on the matter?
I believe it is customary to attach a MakeFile, so here is one (interesting parts version):
Thank you!
I've recently successfully compiled VASP 5.2 on our new linux cluster using the (much appreciated) help found in this forum.
(em64t, MKL , openMPI everything updated to the latest version)
I seem to be facing a strange problem, though. When I try to perform the medium benchmark with NPAR =4, I get decent results, but for a crash on a test of two nodes only (16 CPUs). Whenever I alter NPAR (e.g. NPAR=1,2 or every run crashes or hangs, mostly with the following output (shown here specifically compiled with traceback)
Code: Select all
forrtl: severe (71): integer divide by zero
Image              PC                   Routine            Line        Source
libmpi.so.0        00002B99EF5724A4  Unknown               Unknown  Unknown
libmpi.so.0        00002B99EF57290D  Unknown               Unknown  Unknown
libmpi.so.0        00002B99EF547794  Unknown               Unknown  Unknown
libmpi_f77.so.0    00002B99EF2E1259  Unknown               Unknown  Unknown
vasp_trace         0000000000472209  mpimy_mp_m_divide         224  mpi.f90
vasp_trace         000000000047E72C  main_mpi_mp_init_         167  main_mpi.f90
vasp_trace         0000000000438E2D  MAIN__                    370  main.f90
vasp_trace         000000000043876C  Unknown               Unknown  Unknown
libc.so.6          00002B99F0544586  Unknown               Unknown  Unknown
vasp_trace         0000000000438669  Unknown               Unknown  Unknown
What would be your take on the matter?
I believe it is customary to attach a MakeFile, so here is one (interesting parts version):
Code: Select all
.SUFFIXES: .inc .f .f90 .F
# all CPP processed fortran files have the extension .f90
SUFFIX=.f90
CPP_ =  ./preprocess <$*.F | /usr/bin/cpp -P -C -traditional >$*$(SUFFIX)
FFLAGS = -I/usr/local/intel/Compiler/11.1/072/mkl/include/fftw -FR -lowercase -assume byterecl -ftz -heap-arrays
OFLAG=-O3Â -xSSE4.2
OFLAG_HIGHÂ =Â $(OFLAG)
OBJ_HIGHÂ =
OBJ_NOOPTÂ =
DEBUGÂ Â =Â -FRÂ -O0
INLINEÂ =Â $(OFLAG)
NEWMKLPATH=/usr/local/intel/Compiler/11.1/072/mkl/lib/em64t
BLAS= -L$(NEWMKLPATH) -lmkl_intel_lp64 -lmkl_sequential  -lmkl_core -lpthread
LAPACK= -L$(NEWMKLPATH) -lmkl_lapack -lmkl_intel_lp64 -lmkl_sequential  -lmkl_core -lpthread
# options for linking, nothing is required (usually)
LINKÂ Â Â Â =
FC=mpif90Â -traceback
FCL=$(FC)
CPPÂ Â Â Â =Â $(CPP_)Â -DMPIÂ Â -DHOST=\"LinuxIFCmkl\"Â -DIFCÂ \
     -Dkind8 -DCACHE_SIZE=12000 -DPGF90 -Davoidalloc  -DNGZhalf \
     -DMPI_BLOCK=50000 -DRPROMU_DGEMV -DRACCMU_DGEMV -DscaLAPACK \
     -Duse_allreduce -Duse_collective
SCA= $(NEWMKLPATH)/libmkl_scalapack_lp64.a $(NEWMKLPATH)/libmkl_blacs_openmpi_lp64.a
LIB     = -L../vasp.5.lib -ldmy  \
      ../vasp.5.lib/linpack_double.o $(LAPACK) \
      $(SCA) $(BLAS)
FFT3D   = fftmpi.o fftmpi_map.o fftw3d.o fft3dlib.o $(HOME)/fftw3xfnew/libfftw3xf_intel.a