parallel compile problems on Opteron quadcore cluster
Posted: Mon Jan 12, 2009 5:14 am
I’m having trouble compiling VASP for parallel operation on our new Infiniband-networked cluster. The system is equipped with 8 cores per node (two Opteron quad cores per node) and runs Rocks OS (RedHat based).
Here are the compile steps I’m using:
1. Compile ATLAS libaries from source using GNU compilers (gcc, gfortran)
2. Extract VASP source from tarball
3. Compile VASP libaries using GNU compilers (all serial, of course)
4. Compile VASP executable using mvapich2 infiniband-compliant compilers (all built using gnu compilers on our machine)
We have successfully used an identical install process for VASP on our old cluster, but with older compiler versions. The above compile sequence results in parallel jobs that show the following error in the OUTCAR, but only occasionally (roughly 50% of the time):
The error above is observed on all attempted combinations of processors and nodes (ex:4 procs each on 4 nodes, 8 procs each on 2 nodes etc). Failure does not seem to occur on specific nodes, ex: a job will complete successfully on node 28, then a different job will fail on node 28 at a later time. Also, failure is not immediate - the job above went through almost three complete ionic iterations before failure.
Small systems (2 atoms) complete successfully more often than do large systems (~30 atoms).
When I compile the executable for serial operation, using serial compilers (gfortran) instead of MPI, everything runs smoothly (albeit slowly) on one processor.
We’ve also tried using the following MPI compilers:
-openmpi, built with gcc compilers (fortran compiler = mpif90)
-mvapich-1.0.0, built with intel compilers (fortran compiler = mpif90)
with similar results (Error EDDAV, as shown above)
Additionally, jobs that complete often show the following error in the standard output file:
I've seen reference to errors like this here, but in the context of a serial compile. Adding the lines suggested in the link did not resolve the error.
Any advice or suggestions are greatly appreciated!
Many thanks,
Nate at University of Wisconsin-Madison
Makefile is below:
Here are the compile steps I’m using:
1. Compile ATLAS libaries from source using GNU compilers (gcc, gfortran)
2. Extract VASP source from tarball
3. Compile VASP libaries using GNU compilers (all serial, of course)
4. Compile VASP executable using mvapich2 infiniband-compliant compilers (all built using gnu compilers on our machine)
We have successfully used an identical install process for VASP on our old cluster, but with older compiler versions. The above compile sequence results in parallel jobs that show the following error in the OUTCAR, but only occasionally (roughly 50% of the time):
Code: Select all
----------------------------------------- Iteration 3( 17)---------------------------------
POTLOK: VPU time 0.77: CPU time 0.77
SETDIJ: VPU time 0.08: CPU time 0.08
Error EDDDAV: Call to ZHEGV failed. Returncode = 64 464
Small systems (2 atoms) complete successfully more often than do large systems (~30 atoms).
When I compile the executable for serial operation, using serial compilers (gfortran) instead of MPI, everything runs smoothly (albeit slowly) on one processor.
We’ve also tried using the following MPI compilers:
-openmpi, built with gcc compilers (fortran compiler = mpif90)
-mvapich-1.0.0, built with intel compilers (fortran compiler = mpif90)
with similar results (Error EDDAV, as shown above)
Additionally, jobs that complete often show the following error in the standard output file:
Code: Select all
Â
WARNING: Sub-Space-Matrix is not hermitian in DAV 11 -1017601.45907357
Any advice or suggestions are greatly appreciated!
Many thanks,
Nate at University of Wisconsin-Madison
Makefile is below:
Code: Select all
.SUFFIXES: .inc .f .f90 .F
#-----------------------------------------------------------------------
#comments section, removed for brevity
#-----------------------------------------------------------------------
# all CPP processed fortran files have the extension .f90
SUFFIX=.f90
#-----------------------------------------------------------------------
# fortran compiler and linker
#-----------------------------------------------------------------------
#FC=/opt/intel/fce/9.0/bin/ifort
# fortran linker
#FCL=$(FC)
#-----------------------------------------------------------------------
# whereis CPP ?? (I need CPP, can't use gcc with proper options)
# that's the location of gcc for SUSE 5.3
#
#  CPP_   =  /usr/lib/gcc-lib/i486-linux/2.7.2/cpp -P -CÂ
#
# that's probably the right line for some Red Hat distribution:
#
#  CPP_   =  /usr/lib/gcc-lib/i386-redhat-linux/2.7.2.3/cpp -P -C
#
#  SUSE X.X, maybe some Red Hat distributions:
CPP_ =  ./preprocess <$*.F | /usr/bin/cpp -P -C -traditional >$*$(SUFFIX)
#-----------------------------------------------------------------------
# possible options for CPP:
# NGXhalf             charge density   reduced in X direction
# wNGXhalf            gamma point only reduced in X direction
# avoidalloc          avoid ALLOCATE if possible
# IFC                 work around some IFC bugs
# CACHE_SIZE          1000 for PII,PIII, 5000 for Athlon, 8000-12000 P4
# RPROMU_DGEMV        use DGEMV instead of DGEMM in RPRO (depends on used BLAS)
# RACCMU_DGEMV        use DGEMV instead of DGEMM in RACC (depends on used BLAS)
# for Atlas  -DRPROMU_DGEMV is recommended
#-----------------------------------------------------------------------
CPPÂ Â Â Â Â =Â $(CPP_)Â Â -DHOST=\"LinuxIFC_ath\"Â \
          -Dkind8 -DNGXhalf -DCACHE_SIZE=5000 -DPGF90 -Davoidalloc \
          -DRPROMU_DGEMVÂ
#-----------------------------------------------------------------------
# general fortran flags  (there must a trailing blank on this line)
#-----------------------------------------------------------------------
#FFLAGSÂ =Â Â Â
#-----------------------------------------------------------------------
#Â optimization
# we have tested whether higher optimisation improves performance
# -axK  SSE1 optimization,  but also generate code executable on all mach.
#       xK improves performance somewhat on XP, and a is required in order
#       to run the code on older Athlons as well
#Â -xWÂ Â Â SSE2Â optimization
# -axW  SSE2 optimization,  but also generate code executable on all mach.
#Â -tpp6Â P3Â optimization
#Â -tpp7Â P4Â optimization
#-----------------------------------------------------------------------
OFLAG=Â -O1Â
OFLAG_HIGHÂ =Â $(OFLAG)
OBJ_HIGHÂ =Â
OBJ_NOOPTÂ =Â
DEBUGÂ Â =Â Â -O0
INLINEÂ =Â $(OFLAG)
#-----------------------------------------------------------------------
# the following lines specify the position of BLAS  and LAPACK
# on Athlon, VASP works fastest with the Atlas library
# so that's what I recommend
#-----------------------------------------------------------------------
# Atlas based libraries
ATLASHOME=Â /usr/local/src/ATLAS/lib
##/root/downloads/ATLAS/ATLAS/lib/Linux_HAMMER64SSE2
BLAS=   -L$(ATLASHOME)  -lf77blas -latlas
# use the mkl Intel libraries for p4 (www.intel.com)
#Â mkl.5.1
# set -DRPROMU_DGEMV  -DRACCMU_DGEMV in the CPP lines
#BLAS=-L/opt/intel/mkl/lib/32Â -lmkl_p4Â Â -lpthread
# mkl.5.2 requires also to -lguide library
# set -DRPROMU_DGEMV  -DRACCMU_DGEMV in the CPP lines
#BLAS=-L/opt/intel/mkl/lib/32 -lmkl_p4 -lguide -lpthread
# even faster Kazushige Goto's BLAS
#Â http://www.cs.utexas.edu/users/kgoto/signup_first.html
#BLAS=Â Â /opt/libs/libgoto/libgoto_p4_512-r0.6.so
# LAPACK, simplest use vasp.4.lib/lapack_double
#LAPACK=Â ../vasp.4.lib/lapack_double.o
# use atlas optimized part of lapackÂ
LAPACK= ../vasp.4.lib/lapack_atlas.o -llapack -lcblas
# use the mkl Intel lapack
#LAPACK=Â -lmkl_lapack
#-----------------------------------------------------------------------
#LIB  = -L../vasp.4.lib -ldmy \
#     ../vasp.4.lib/linpack_double.o $(LAPACK) \
#Â Â Â Â Â $(BLAS)Â
# options for linking (for compiler version 6.X, 7.1) nothing is required
#LINKÂ Â Â Â =Â
# compiler version 7.0 generates some vector statments which are located
# in the svml library, add the LIBPATH and the library (just in case)
#LINKÂ Â Â Â =Â Â -L/opt/intel/compiler70/ia32/lib/Â -lsvmlÂ
#-----------------------------------------------------------------------
# fft libraries:
# VASP.4.6 can use fftw.3.0.X (http://www.fftw.org)
# since this version is faster on P4 machines, we recommend to use it
#-----------------------------------------------------------------------
#FFT3D   = fft3dfurth.o fft3dlib.o
#FFT3D   = fftw3d.o fft3dlib.o   /opt/libs/fftw-3.0.1/lib/libfftw3.a
#=======================================================================
# MPI section, uncomment the following lines
#Â
# one comment for users of mpich or lam:
# You must *not* compile mpi with g77/f77, because f77/g77            Â
# appends *two* underscores to symbols that contain already an       Â
# underscore (i.e. MPI_SEND becomes mpi_send__).  The pgf90/ifc
# compilers however append only one underscore.
# Precompiled mpi version will also not work !!!
#
# We found that mpich.1.2.1 and lam-6.5.X to lam-7.0.4 are stable
# mpich.1.2.1 was configured withÂ
#  ./configure -prefix=/usr/local/mpich_nodvdbg -fc="pgf77 -Mx,119,0x200000"  \
#Â -f90="pgf90Â -Mx,119,0x200000"Â \
# --without-romio --without-mpe -opt=-O \
#Â
# lam was configured with the line
#  ./configure  -prefix /opt/libs/lam-7.0.4 --with-cflags=-O -with-fc=ifc \
#Â --with-f77flags=-OÂ --without-romio
#Â
# please note that you might be able to use a lam or mpich versionÂ
# compiled with f77/g77, but then you need to add the following
# options: -Msecond_underscore (compilation) and -g77libs (linking)
#
# !!! Please do not send me any queries on how to install MPI, I will
# certainly not answer them !!!!
#=======================================================================
#-----------------------------------------------------------------------
# fortran linker for mpi: if you use LAM and compiled it with the options
# suggested above,  you can use the following line
#-----------------------------------------------------------------------
FC=/share/apps/mvapich2/ohioState/1.2p1-gnu/bin/mpif90
#FC=/usr/mpi/gcc/openmpi-1.2.5/bin/mpif90
#FC=Â /usr/mpi/pgi/mvapich-1.0.0/bin/mpif90Â Â
#FC= /opt/intel/impi/3.1/bin64/mpiifort -static_mpi
FCL=$(FC)
#-----------------------------------------------------------------------
# additional options for CPP in parallel version (see also above):
# NGZhalf               charge density   reduced in Z direction
# wNGZhalf              gamma point only reduced in Z direction
# scaLAPACK             use scaLAPACK (usually slower on 100 Mbit Net)
# 1000 or 2000 are the optimal CACHE_SIZE for the parallel version
# and IFC on Athlon XP (gK)
#-----------------------------------------------------------------------
CPPÂ Â Â Â =Â $(CPP_)Â -DMPIÂ Â -DHOST=\"LinuxIFC_ath\"Â -DIFCÂ \
     -Dkind8 -DNGZhalf -DCACHE_SIZE=2000 -DPGF90 -Davoidalloc \
     -DRPROMU_DGEMV
#-----------------------------------------------------------------------
# location of SCALAPACK
# if you do not use SCALAPACK simply uncomment the line SCA
#-----------------------------------------------------------------------
#BLACS=$(HOME)/archives/SCALAPACK/BLACS/
#SCA_=$(HOME)/archives/SCALAPACK/SCALAPACK
#SCA= $(SCA_)/libscalapack.a  \
# $(BLACS)/LIB/blacsF77init_MPI-LINUX-0.a $(BLACS)/LIB/blacs_MPI-LINUX-0.a $(BLACS)/LIB/blacsF77init_MPI-LINUX-0.a
SCA=
#-----------------------------------------------------------------------
# libraries for mpi
#-----------------------------------------------------------------------
LIB     = -L../vasp.4.lib -ldmy  \
      ../vasp.4.lib/linpack_double.o $(LAPACK) \
      $(SCA) $(BLAS)Â
##-static
# FFT: fftmpi.o with fft3dlib of Juergen Furthmueller
FFT3D   = fftmpi.o fftmpi_map.o fft3dlib.oÂ
# fftw.3.0.1 is slighly faster and should be used if available
#FFT3D   = fftmpiw.o fftmpi_map.o fft3dlib.o   /opt/libs/fftw-3.0.1/lib/libfftw3.a
#-----------------------------------------------------------------------
# general rules and compile lines
#-----------------------------------------------------------------------
BASIC=   symmetry.o symlib.o   lattlib.o  random.o  Â
SOURCE=  base.o     mpi.o      smart_allocate.o      xml.o  \
         constant.o jacobi.o   main_mpi.o  scala.o   \
         asa.o      lattice.o  poscar.o   ini.o      setex.o     radial.o  \
         pseudo.o   mgrid.o    mkpoints.o wave.o      wave_mpi.o  $(BASIC) \
         nonl.o     nonlr.o    dfast.o    choleski2.o    \
         mix.o      charge.o   xcgrad.o   xcspin.o    potex1.o   potex2.o  \
         metagga.o  constrmag.o pot.o      cl_shift.o force.o    dos.o      elf.o      \
         tet.o      hamil.o    steep.o    \
         chain.o    dyna.o     relativistic.o LDApU.o sphpro.o  paw.o   us.o \
         ebs.o      wavpre.o   wavpre_noio.o broyden.o \
         dynbr.o    rmm-diis.o reader.o   writer.o   tutor.o xml_writer.o \
         brent.o    stufak.o   fileio.o   opergrid.o stepver.o  \
         dipol.o    xclib.o    chgloc.o   subrot.o   optreal.o   davidson.o \
         edtest.o   electron.o shm.o      pardens.o  paircorrection.o \
         optics.o   constr_cell_relax.o   stm.o    finite_diff.o \
         elpol.o    setlocalpp.oÂ
Â
INC=
vasp:Â $(SOURCE)Â $(FFT3D)Â $(INC)Â main.oÂ
        rm -f vasp
        $(FCL) -o vasp $(LINK) main.o  $(SOURCE)   $(FFT3D) $(LIB)Â
makeparam: $(SOURCE) $(FFT3D) makeparam.o main.F $(INC)
        $(FCL) -o makeparam  $(LINK) makeparam.o $(SOURCE) $(FFT3D) $(LIB)
zgemmtest: zgemmtest.o base.o random.o $(INC)
        $(FCL) -o zgemmtest $(LINK) zgemmtest.o random.o base.o $(LIB)
dgemmtest: dgemmtest.o base.o random.o $(INC)
        $(FCL) -o dgemmtest $(LINK) dgemmtest.o random.o base.o $(LIB)Â
ffttest: base.o smart_allocate.o mpi.o mgrid.o random.o ffttest.o $(FFT3D) $(INC)
        $(FCL) -o ffttest $(LINK) ffttest.o mpi.o mgrid.o random.o smart_allocate.o base.o $(FFT3D) $(LIB)
kpoints: $(SOURCE) $(FFT3D) makekpoints.o main.F $(INC)
        $(FCL) -o kpoints $(LINK) makekpoints.o $(SOURCE) $(FFT3D) $(LIB)
clean:
        -rm -f *.g *.f *.o *.L *.mod ; touch *.F
main.o:Â main$(SUFFIX)
        $(FC) $(FFLAGS)$(DEBUG)  $(INCS) -c main$(SUFFIX)
xcgrad.o:Â xcgrad$(SUFFIX)
        $(FC) $(FFLAGS) $(INLINE)  $(INCS) -c xcgrad$(SUFFIX)
xcspin.o:Â xcspin$(SUFFIX)
        $(FC) $(FFLAGS) $(INLINE)  $(INCS) -c xcspin$(SUFFIX)
makeparam.o:Â makeparam$(SUFFIX)
        $(FC) $(FFLAGS)$(DEBUG)  $(INCS) -c makeparam$(SUFFIX)
makeparam$(SUFFIX):Â makeparam.FÂ main.FÂ
#
# MIND: I do not have a full dependency list for the include
# and MODULES: here are only the minimal basic dependencies
# if one strucuture is changed then touch_dep must be called
# with the corresponding name of the structure
#
base.o: base.inc base.F
mgrid.o: mgrid.inc mgrid.F
constant.o: constant.inc constant.F
lattice.o: lattice.inc lattice.F
setex.o: setexm.inc setex.F
pseudo.o: pseudo.inc pseudo.F
poscar.o: poscar.inc poscar.F
mkpoints.o: mkpoints.inc mkpoints.F
wave.o: wave.inc wave.F
nonl.o: nonl.inc nonl.F
nonlr.o: nonlr.inc nonlr.F
$(OBJ_HIGH):
        $(CPP)
        $(FC) $(FFLAGS) $(OFLAG_HIGH) $(INCS) -c $*$(SUFFIX)
$(OBJ_NOOPT):
        $(CPP)
        $(FC) $(FFLAGS) $(INCS) -c $*$(SUFFIX)
fft3dlib_f77.o:Â fft3dlib_f77.F
        $(CPP)
        $(F77) $(FFLAGS_F77) -c $*$(SUFFIX)
.F.o:
        $(CPP)
        $(FC) $(FFLAGS) $(OFLAG) $(INCS) -c $*$(SUFFIX)
.F$(SUFFIX):
        $(CPP)
$(SUFFIX).o:
        $(FC) $(FFLAGS) $(OFLAG) $(INCS) -c $*$(SUFFIX)
# special rules
#-----------------------------------------------------------------------
# -tpp5|6|7 P, PII-PIII, PIV
# -xW use SIMD (does not pay of on PII, since fft3d uses double prec)
# all other options do no affect the code performance since -O1 is used
#-----------------------------------------------------------------------
fft3dlib.o : fft3dlib.F
        $(CPP)
        $(FC)  -lowercase -O1 -unroll0 -c $*$(SUFFIX)
        $(CPP)
        $(FC)  -lowercase -O1 -c $*$(SUFFIX)
lattlib.o:Â lattlib.F
        $(CPP)
        $(FC)  -lowercase -O1 -c $*$(SUFFIX)
radial.o : radial.F
        $(CPP)
        $(FC)  -lowercase -O1 -c $*$(SUFFIX)
symlib.o : symlib.F
        $(CPP)
        $(FC)  -lowercase -O1 -c $*$(SUFFIX)
symmetry.o : symmetry.F
        $(CPP)
        $(FC)  -lowercase -O1 -c $*$(SUFFIX)
dynbr.o : dynbr.F
        $(CPP)
        $(FC)  -lowercase -O1 -c $*$(SUFFIX)
us.o : us.F
        $(CPP)
        $(FC)  -lowercase -O1 -c $*$(SUFFIX)
broyden.o : broyden.F
        $(CPP)
        $(FC)  -lowercase -O1 -c $*$(SUFFIX)
wave.o : wave.F
        $(CPP)
        $(FC)  -lowercase -O0 -c $*$(SUFFIX)
LDApU.o : LDApU.F
        $(CPP)
        $(FC)  -lowercase -O1 -c $*$(SUFFIX)