Successful installation of VASP 5.2.12 on Xeon, parallel and serial

Message

kvas · #1 Post by **kvas** » Fri Aug 12, 2011 5:38 pm

Hello, everybody!
I successfully installed VASP 5.2.12 on my DELL desktop, 2 Intel Xeon Quad cores, 8 CPUs in total. I spend a lot of time for it, I'd like to share the experience and I wish you to spend less.

System: Ubuntu 11.04, 64 bit
C compiler: gcc (you can also use icc, i installed Intel C++ Composer XE)
Fortran compiler: Intel Fortran Composer XE 2011
GotoBLAS2-1.13 (http://www.tacc.utexas.edu/tacc-project ... /downloads)
FFTW 3.2.2 (http://www.fftw.org/), you can use newer if available
Openmpi 1.4.3 (http://www.open-mpi.org/), you can use newer if available

Since I use ubuntu I almost always use sudo command like
sudo make TARGET=CORE2 USE_THREAD=0
I don't write sudo thing further in this post since it's feature of ubuntu.

Parallel version

1) installation of GotoBLAS2
Basically it's suggested to type just make but this does not always work. I advise to build like
make TARGET=CORE2 USE_THREAD=0
be careful with TARGET flag, don't use default or at list check, you could have problems because of that. Second one has to specify USE_THREAD=0 to have not threaded version, it's faster when used with OpenMPI.
You can also specify the compiler:
make CC=gcc (or icc) FC=ifort TARGET=CORE2 USE_THREAD=0
I had problems to compile with icc, so I used gcc + ifort

2) Installation of FFTW
To compile read supported readme files.
./configure --prefix=/folder/for/fftw
make
make install
I didn't specify any CC=gcc for ./configure, so I used default for me gcc compiler.

3) Installation of OpenMPI
To compile OpenMPI read supported readme files.
./configure CC=icc CXX=icc F77=ifort F90=ifort --prefix=/folder/for/openmpi
make all install
Again I had problems with icc, so I compiled it with gcc, again I used gcc + ifort

4) VASP lib
I used makefile.linux_ifc_P4 file the only change you need, it's
FC=ifort

5) Compilation of VASP
Again I started from makefile.linux_ifc_P4 file.
Things I changed:
1) FFLAGS = -FR -names lowercase -assume byterecl
2) If you use GotoBLAS
BLAS= /home/alex/VASP/GotoBLAS2/libgoto2.so
or you can use MKL, but MKL is slow for parallel version for my case
BLAS=-L/opt/intel/composerxe-2011.4.191/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread
3) LAPACK
LAPACK= ../vasp.5.lib/lapack_double.o
4) fortran compiler/linker for mpi
FC=/home/alex/VASP/openmpi/bin/mpif77
5) FFTW for mpi
FFT3D = fftmpiw.o fftmpi_map.o fftw3d.o fft3dlib.o /home/alex/VASP/fft/lib/libfftw3.a
INCS = -I/home/alex/VASP/fft/include
6) I tried to set -O3 optimization flag in line
OFLAG=-O2 -ip -ftz
but I didn't find any advantage and I left it as it is.
7) Probably I forgot something but anyway you can see the makefile.

Makefile itself:

.SUFFIXES: .inc .f .f90 .F
#-----------------------------------------------------------------------
# Makefile for Intel Fortran compiler for Pentium/Athlon/Opteron
# bases systems
# we recommend this makefile for both Intel as well as AMD systems
# for AMD based systems appropriate BLAS and fftw libraries are
# however mandatory (whereas they are optional for Intel platforms)
#
# The makefile was tested only under Linux on Intel and AMD platforms
# the following compiler versions have been tested:
# - ifc.7.1 works stable somewhat slow but reliably
# - ifc.8.1 fails to compile the code properly
# - ifc.9.1 recommended (both for 32 and 64 bit)
# - ifc.10.1 partially recommended (both for 32 and 64 bit)
# tested build 20080312 Package ID: l_fc_p_10.1.015
# the gamma only mpi version can not be compiles
# using ifc.10.1
#
# it might be required to change some of library pathes, since
# LINUX installation vary a lot
# Hence check ***ALL*** options in this makefile very carefully
#-----------------------------------------------------------------------
#
# BLAS must be installed on the machine
# there are several options:
# 1) very slow but works:
# retrieve the lapackage from ftp.netlib.org
# and compile the blas routines (BLAS/SRC directory)
# please use g77 or f77 for the compilation. When I tried to
# use pgf77 or pgf90 for BLAS, VASP hang up when calling
# ZHEEV (however this was with lapack 1.1 now I use lapack 2.0)
# 2) more desirable: get an optimized BLAS
#
# the two most reliable packages around are presently:
# 2a) Intels own optimised BLAS (PIII, P4, PD, PC2, Itanium)
# http://developer.intel.com/software/products/mkl/
# this is really excellent, if you use Intel CPU's
#
# 2b) probably fastest SSE2 (4 GFlops on P4, 2.53 GHz, 16 GFlops PD,
# around 30 GFlops on Quad core)
# Kazushige Goto's BLAS
# http://www.cs.utexas.edu/users/kgoto/signup_first.html
# http://www.tacc.utexas.edu/resources/software/
#
#-----------------------------------------------------------------------

# all CPP processed fortran files have the extension .f90
SUFFIX=.f90

#-----------------------------------------------------------------------
# fortran compiler and linker
#-----------------------------------------------------------------------
#FC=ifc
# fortran linker
#FCL=$(FC)

#-----------------------------------------------------------------------
# whereis CPP ?? (I need CPP, can't use gcc with proper options)
# that's the location of gcc for SUSE 5.3
#
# CPP_ = /usr/lib/gcc-lib/i486-linux/2.7.2/cpp -P -C
#
# that's probably the right line for some Red Hat distribution:
#
# CPP_ = /usr/lib/gcc-lib/i386-redhat-linux/2.7.2.3/cpp -P -C
#
# SUSE X.X, maybe some Red Hat distributions:

CPP_ = ./preprocess <$*.F | /usr/bin/cpp -P -C -traditional >$*$(SUFFIX)

#-----------------------------------------------------------------------
# possible options for CPP:
# NGXhalf charge density reduced in X direction
# wNGXhalf gamma point only reduced in X direction
# avoidalloc avoid ALLOCATE if possible
# PGF90 work around some for some PGF90 / IFC bugs
# CACHE_SIZE 1000 for PII,PIII, 5000 for Athlon, 8000-12000 P4, PD
# RPROMU_DGEMV use DGEMV instead of DGEMM in RPRO (depends on used BLAS)
# RACCMU_DGEMV use DGEMV instead of DGEMM in RACC (depends on used BLAS)
# tbdyn MD package of Tomas Bucko
#-----------------------------------------------------------------------

CPP = $(CPP_) -DHOST=\"LinuxIFC\" \
-DCACHE_SIZE=12000 -DPGF90 -Davoidalloc -DNGXhalf \
# -DRPROMU_DGEMV -DRACCMU_DGEMV

#-----------------------------------------------------------------------
# general fortran flags (there must a trailing blank on this line)
# byterecl is strictly required for ifc, since otherwise
# the WAVECAR file becomes huge
#-----------------------------------------------------------------------

FFLAGS = -FR -names lowercase -assume byterecl

#-----------------------------------------------------------------------
# optimization
# we have tested whether higher optimisation improves performance
# -axK SSE1 optimization, but also generate code executable on all mach.
# xK improves performance somewhat on XP, and a is required in order
# to run the code on older Athlons as well
# -xW SSE2 optimization
# -axW SSE2 optimization, but also generate code executable on all mach.
# -tpp6 P3 optimization
# -tpp7 P4 optimization
#-----------------------------------------------------------------------

# ifc.9.1, ifc.10.1 recommended
OFLAG=-O2 -ip -ftz

OFLAG_HIGH = $(OFLAG)
OBJ_HIGH =
OBJ_NOOPT =
DEBUG = -FR -O0
INLINE = $(OFLAG)

#-----------------------------------------------------------------------
# the following lines specify the position of BLAS and LAPACK
# VASP works fastest with the libgoto library
# so that's what we recommend
#-----------------------------------------------------------------------

# mkl.10.0
# set -DRPROMU_DGEMV -DRACCMU_DGEMV in the CPP lines
#BLAS=-L/opt/intel/mkl100/lib/em64t -lmkl -lpthread

# even faster for VASP Kazushige Goto's BLAS
# http://www.cs.utexas.edu/users/kgoto/signup_first.html
# parallel goto version requires sometimes -libverbs
BLAS= /home/alex/VASP/GotoBLAS2-1.13_bsd/GotoBLAS2/libgoto2.so

# LAPACK, simplest use vasp.5.lib/lapack_double
LAPACK= ../vasp.5.lib/lapack_double.o

# use the mkl Intel lapack
#LAPACK= -lmkl_lapack

#-----------------------------------------------------------------------

LIB = -L../vasp.5.lib -ldmy \
../vasp.5.lib/linpack_double.o $(LAPACK) \
$(BLAS)

# options for linking, nothing is required (usually)
LINK =

#-----------------------------------------------------------------------
# fft libraries:
# VASP.5.2 can use fftw.3.1.X (http://www.fftw.org)
# since this version is faster on P4 machines, we recommend to use it
#-----------------------------------------------------------------------

#FFT3D = fft3dfurth.o fft3dlib.o

# alternatively: fftw.3.1.X is slighly faster and should be used if available
#FFT3D = fftw3d.o fft3dlib.o /opt/libs/fftw-3.1.2/lib/libfftw3.a

#=======================================================================
# MPI section, uncomment the following lines until
# general rules and compile lines
# presently we recommend OPENMPI, since it seems to offer better
# performance than lam or mpich
#
# !!! Please do not send me any queries on how to install MPI, I will
# certainly not answer them !!!!
#=======================================================================
#-----------------------------------------------------------------------
# fortran linker for mpi
#-----------------------------------------------------------------------

FC=/home/alex/VASP/openmpi/bin/mpif77
FCL=$(FC)

#-----------------------------------------------------------------------
# additional options for CPP in parallel version (see also above):
# NGZhalf charge density reduced in Z direction
# wNGZhalf gamma point only reduced in Z direction
# scaLAPACK use scaLAPACK (usually slower on 100 Mbit Net)
# avoidalloc avoid ALLOCATE if possible
# PGF90 work around some for some PGF90 / IFC bugs
# CACHE_SIZE 1000 for PII,PIII, 5000 for Athlon, 8000-12000 P4, PD
# RPROMU_DGEMV use DGEMV instead of DGEMM in RPRO (depends on used BLAS)
# RACCMU_DGEMV use DGEMV instead of DGEMM in RACC (depends on used BLAS)
# tbdyn MD package of Tomas Bucko
#-----------------------------------------------------------------------

#-----------------------------------------------------------------------

CPP = $(CPP_) -DMPI -DHOST=\"LinuxIFC\" -DIFC \
-DCACHE_SIZE=4000 -DPGF90 -Davoidalloc -DNGZhalf \
-DMPI_BLOCK=8000
# -DRPROMU_DGEMV -DRACCMU_DGEMV

#-----------------------------------------------------------------------
# location of SCALAPACK
# if you do not use SCALAPACK simply leave that section commented out
#-----------------------------------------------------------------------

#BLACS=$(HOME)/archives/SCALAPACK/BLACS/
#SCA_=$(HOME)/archives/SCALAPACK/SCALAPACK

#SCA= $(SCA_)/libscalapack.a \
# $(BLACS)/LIB/blacsF77init_MPI-LINUX-0.a $(BLACS)/LIB/blacs_MPI-LINUX-0.a $(BLACS)/LIB/blacsF77init_MPI-LINUX-0.a

SCA=

#-----------------------------------------------------------------------
# libraries for mpi
#-----------------------------------------------------------------------

#LIB = -L../vasp.5.lib -ldmy \
# ../vasp.5.lib/linpack_double.o $(LAPACK) \
# $(SCA) $(BLAS)

# FFT: fftmpi.o with fft3dlib of Juergen Furthmueller
#FFT3D = fftmpi.o fftmpi_map.o fft3dfurth.o fft3dlib.o

# alternatively: fftw.3.1.X is slighly faster and should be used if available
FFT3D = fftmpiw.o fftmpi_map.o fftw3d.o fft3dlib.o /home/alex/VASP/fft/lib/libfftw3.a

INCS = -I/home/alex/VASP/fft/include

#-----------------------------------------------------------------------
# general rules and compile lines
#-----------------------------------------------------------------------
BASIC= symmetry.o symlib.o lattlib.o random.o

SOURCE= base.o mpi.o smart_allocate.o xml.o \
constant.o jacobi.o main_mpi.o scala.o \
asa.o lattice.o poscar.o ini.o mgrid.o xclib.o vdw_nl.o xclib_grad.o \
radial.o pseudo.o gridq.o ebs.o \
mkpoints.o wave.o wave_mpi.o wave_high.o \
$(BASIC) nonl.o nonlr.o nonl_high.o dfast.o choleski2.o \
mix.o hamil.o xcgrad.o xcspin.o potex1.o potex2.o \
constrmag.o cl_shift.o relativistic.o LDApU.o \
paw_base.o metagga.o egrad.o pawsym.o pawfock.o pawlhf.o rhfatm.o paw.o \
mkpoints_full.o charge.o Lebedev-Laikov.o stockholder.o dipol.o pot.o \
dos.o elf.o tet.o tetweight.o hamil_rot.o \
steep.o chain.o dyna.o sphpro.o us.o core_rel.o \
aedens.o wavpre.o wavpre_noio.o broyden.o \
dynbr.o rmm-diis.o reader.o writer.o tutor.o xml_writer.o \
brent.o stufak.o fileio.o opergrid.o stepver.o \
chgloc.o fast_aug.o fock.o mkpoints_change.o sym_grad.o \
mymath.o internals.o dynconstr.o dimer_heyden.o dvvtrajectory.o vdwforcefield.o \
hamil_high.o nmr.o pead.o mlwf.o subrot.o subrot_scf.o \
force.o pwlhf.o gw_model.o optreal.o davidson.o david_inner.o \
electron.o rot.o electron_all.o shm.o pardens.o paircorrection.o \
optics.o constr_cell_relax.o stm.o finite_diff.o elpol.o \
hamil_lr.o rmm-diis_lr.o subrot_cluster.o subrot_lr.o \
lr_helper.o hamil_lrf.o elinear_response.o ilinear_response.o \
linear_optics.o linear_response.o \
setlocalpp.o wannier.o electron_OEP.o electron_lhf.o twoelectron4o.o \
ratpol.o screened_2e.o wave_cacher.o chi_base.o wpot.o local_field.o \
ump2.o bse_te.o bse.o acfdt.o chi.o sydmat.o dmft.o \
rmm-diis_mlr.o linear_response_NMR.o

vasp: $(SOURCE) $(FFT3D) $(INC) main.o
rm -f vasp
$(FCL) -o vasp main.o $(SOURCE) $(FFT3D) $(LIB) $(LINK)
makeparam: $(SOURCE) $(FFT3D) makeparam.o main.F $(INC)
$(FCL) -o makeparam $(LINK) makeparam.o $(SOURCE) $(FFT3D) $(LIB)
zgemmtest: zgemmtest.o base.o random.o $(INC)
$(FCL) -o zgemmtest $(LINK) zgemmtest.o random.o base.o $(LIB)
dgemmtest: dgemmtest.o base.o random.o $(INC)
$(FCL) -o dgemmtest $(LINK) dgemmtest.o random.o base.o $(LIB)
ffttest: base.o smart_allocate.o mpi.o mgrid.o random.o ffttest.o $(FFT3D) $(INC)
$(FCL) -o ffttest $(LINK) ffttest.o mpi.o mgrid.o random.o smart_allocate.o base.o $(FFT3D) $(LIB)
kpoints: $(SOURCE) $(FFT3D) makekpoints.o main.F $(INC)
$(FCL) -o kpoints $(LINK) makekpoints.o $(SOURCE) $(FFT3D) $(LIB)

clean:
-rm -f *.g *.f90 *.o *.L *.mod ; touch *.F

main.o: main$(SUFFIX)
$(FC) $(FFLAGS)$(DEBUG) $(INCS) -c main$(SUFFIX)
xcgrad.o: xcgrad$(SUFFIX)
$(FC) $(FFLAGS) $(INLINE) $(INCS) -c xcgrad$(SUFFIX)
xcspin.o: xcspin$(SUFFIX)
$(FC) $(FFLAGS) $(INLINE) $(INCS) -c xcspin$(SUFFIX)

makeparam.o: makeparam$(SUFFIX)
$(FC) $(FFLAGS)$(DEBUG) $(INCS) -c makeparam$(SUFFIX)

makeparam$(SUFFIX): makeparam.F main.F
#
# MIND: I do not have a full dependency list for the include
# and MODULES: here are only the minimal basic dependencies
# if one strucuture is changed then touch_dep must be called
# with the corresponding name of the structure
#
base.o: base.inc base.F
mgrid.o: mgrid.inc mgrid.F
constant.o: constant.inc constant.F
lattice.o: lattice.inc lattice.F
setex.o: setexm.inc setex.F
pseudo.o: pseudo.inc pseudo.F
poscar.o: poscar.inc poscar.F
mkpoints.o: mkpoints.inc mkpoints.F
wave.o: wave.F
nonl.o: nonl.inc nonl.F
nonlr.o: nonlr.inc nonlr.F

$(OBJ_HIGH):
$(CPP)
$(FC) $(FFLAGS) $(OFLAG_HIGH) $(INCS) -c $*$(SUFFIX)
$(OBJ_NOOPT):
$(CPP)
$(FC) $(FFLAGS) $(INCS) -c $*$(SUFFIX)

fft3dlib_f77.o: fft3dlib_f77.F
$(CPP)
$(F77) $(FFLAGS_F77) -c $*$(SUFFIX)

.F.o:
$(CPP)
$(FC) $(FFLAGS) $(OFLAG) $(INCS) -c $*$(SUFFIX)
.F$(SUFFIX):
$(CPP)
$(SUFFIX).o:
$(FC) $(FFLAGS) $(OFLAG) $(INCS) -c $*$(SUFFIX)

# special rules
#-----------------------------------------------------------------------
# these special rules are cummulative (that is once failed
# in one compiler version, stays in the list forever)
# -tpp5|6|7 P, PII-PIII, PIV
# -xW use SIMD (does not pay of on PII, since fft3d uses double prec)
# all other options do no affect the code performance since -O1 is used

fft3dlib.o : fft3dlib.F
$(CPP)
$(FC) -FR -names lowercase -O2 -c $*$(SUFFIX)

fft3dfurth.o : fft3dfurth.F
$(CPP)
$(FC) -FR -names lowercase -O1 -c $*$(SUFFIX)

fftw3d.o : fftw3d.F
$(CPP)
$(FC) -FR -names lowercase -O1 $(INCS) -c $*$(SUFFIX)

wave_high.o : wave_high.F
$(CPP)
$(FC) -FR -names lowercase -O1 -c $*$(SUFFIX)

radial.o : radial.F
$(CPP)
$(FC) -FR -names lowercase -O1 -c $*$(SUFFIX)

symlib.o : symlib.F
$(CPP)
$(FC) -FR -names lowercase -O1 -c $*$(SUFFIX)

symmetry.o : symmetry.F
$(CPP)
$(FC) -FR -names lowercase -O1 -c $*$(SUFFIX)

wave_mpi.o : wave_mpi.F
$(CPP)
$(FC) -FR -names lowercase -O1 -c $*$(SUFFIX)

wave.o : wave.F
$(CPP)
$(FC) -FR -names lowercase -O1 -c $*$(SUFFIX)

dynbr.o : dynbr.F
$(CPP)
$(FC) -FR -names lowercase -O1 -c $*$(SUFFIX)

asa.o : asa.F
$(CPP)
$(FC) -FR -names lowercase -O1 -c $*$(SUFFIX)

broyden.o : broyden.F
$(CPP)
$(FC) -FR -names lowercase -O2 -c $*$(SUFFIX)

us.o : us.F
$(CPP)
$(FC) -FR -names lowercase -O1 -c $*$(SUFFIX)

LDApU.o : LDApU.F
$(CPP)
$(FC) -FR -names lowercase -O2 -c $*$(SUFFIX)

About serial version, read the next post.

[ Edited ]

kvas · #2 Post by **kvas** » Fri Aug 12, 2011 5:52 pm

Warning!
Be careful about TARGET when installing GotoBLAS. First I used TARGET=NEHALEM since I have Nehalem based Intel Xeon processor but I had problems in VASP, sometimes minimization in vasp didn't work properly, so finally I switched to TARGET=CORE2. Probably it's a bug of GotoBLAS compilation, probably I did something wrong.. Just be careful with it.
[ Edited Mon Aug 22 2011, 06:33PM ]

kvas · #3 Post by **kvas** » Fri Aug 12, 2011 5:59 pm

[ Edited Mon Aug 22 2011, 03:38PM ]

kvas · #4 Post by **kvas** » Fri Aug 12, 2011 6:11 pm

It's also possible to compile and use serial version of VASP. You will have some parallelization provided by OS but still parallel version with mpi is faster.

For serial version you can use either MKL or GotoBLAS2 multi-thread (!) compiled (unlike not-threaded version you use with OpenMPI). See the makefile.

There is the makefile for serial version:

.SUFFIXES: .inc .f .f90 .F
#-----------------------------------------------------------------------
# Makefile for Intel Fortran compiler for Pentium/Athlon/Opteron
# bases systems
# we recommend this makefile for both Intel as well as AMD systems
# for AMD based systems appropriate BLAS and fftw libraries are
# however mandatory (whereas they are optional for Intel platforms)
#
# The makefile was tested only under Linux on Intel and AMD platforms
# the following compiler versions have been tested:
# - ifc.7.1 works stable somewhat slow but reliably
# - ifc.8.1 fails to compile the code properly
# - ifc.9.1 recommended (both for 32 and 64 bit)
# - ifc.10.1 partially recommended (both for 32 and 64 bit)
# tested build 20080312 Package ID: l_fc_p_10.1.015
# the gamma only mpi version can not be compiles
# using ifc.10.1
#
# it might be required to change some of library pathes, since
# LINUX installation vary a lot
# Hence check ***ALL*** options in this makefile very carefully
#-----------------------------------------------------------------------
#
# BLAS must be installed on the machine
# there are several options:
# 1) very slow but works:
# retrieve the lapackage from ftp.netlib.org
# and compile the blas routines (BLAS/SRC directory)
# please use g77 or f77 for the compilation. When I tried to
# use pgf77 or pgf90 for BLAS, VASP hang up when calling
# ZHEEV (however this was with lapack 1.1 now I use lapack 2.0)
# 2) more desirable: get an optimized BLAS
#
# the two most reliable packages around are presently:
# 2a) Intels own optimised BLAS (PIII, P4, PD, PC2, Itanium)
# http://developer.intel.com/software/products/mkl/
# this is really excellent, if you use Intel CPU's
#
# 2b) probably fastest SSE2 (4 GFlops on P4, 2.53 GHz, 16 GFlops PD,
# around 30 GFlops on Quad core)
# Kazushige Goto's BLAS
# http://www.cs.utexas.edu/users/kgoto/signup_first.html
# http://www.tacc.utexas.edu/resources/software/
#
#-----------------------------------------------------------------------

# all CPP processed fortran files have the extension .f90
SUFFIX=.f90

#-----------------------------------------------------------------------
# fortran compiler and linker
#-----------------------------------------------------------------------
FC=ifort
# fortran linker
FCL=$(FC)

#-----------------------------------------------------------------------
# whereis CPP ?? (I need CPP, can't use gcc with proper options)
# that's the location of gcc for SUSE 5.3
#
# CPP_ = /usr/lib/gcc-lib/i486-linux/2.7.2/cpp -P -C
#
# that's probably the right line for some Red Hat distribution:
#
# CPP_ = /usr/lib/gcc-lib/i386-redhat-linux/2.7.2.3/cpp -P -C
#
# SUSE X.X, maybe some Red Hat distributions:

CPP_ = ./preprocess <$*.F | /usr/bin/cpp -P -C -traditional >$*$(SUFFIX)

#-----------------------------------------------------------------------
# possible options for CPP:
# NGXhalf charge density reduced in X direction
# wNGXhalf gamma point only reduced in X direction
# avoidalloc avoid ALLOCATE if possible
# PGF90 work around some for some PGF90 / IFC bugs
# CACHE_SIZE 1000 for PII,PIII, 5000 for Athlon, 8000-12000 P4, PD
# RPROMU_DGEMV use DGEMV instead of DGEMM in RPRO (depends on used BLAS)
# RACCMU_DGEMV use DGEMV instead of DGEMM in RACC (depends on used BLAS)
# tbdyn MD package of Tomas Bucko
#-----------------------------------------------------------------------

CPP = $(CPP_) -DHOST=\"LinuxIFC\" \
-DCACHE_SIZE=16000 -DPGF90 -Davoidalloc -DNGXhalf \
-DRPROMU_DGEMV -DRACCMU_DGEMV

#-----------------------------------------------------------------------
# general fortran flags (there must a trailing blank on this line)
# byterecl is strictly required for ifc, since otherwise
# the WAVECAR file becomes huge
#-----------------------------------------------------------------------

FFLAGS = -FR -names lowercase -assume byterecl

#-----------------------------------------------------------------------
# optimization
# we have tested whether higher optimisation improves performance
# -axK SSE1 optimization, but also generate code executable on all mach.
# xK improves performance somewhat on XP, and a is required in order
# to run the code on older Athlons as well
# -xW SSE2 optimization
# -axW SSE2 optimization, but also generate code executable on all mach.
# -tpp6 P3 optimization
# -tpp7 P4 optimization
#-----------------------------------------------------------------------

# ifc.9.1, ifc.10.1 recommended
OFLAG = -O2 -ip -ftz

OFLAG_HIGH = $(OFLAG)
OBJ_HIGH =
OBJ_NOOPT =
DEBUG = -FR -O0
INLINE = $(OFLAG)

#-----------------------------------------------------------------------
# the following lines specify the position of BLAS and LAPACK
# VASP works fastest with the libgoto library
# so that's what we recommend
#-----------------------------------------------------------------------

# mkl.10.0
# set -DRPROMU_DGEMV -DRACCMU_DGEMV in the CPP lines
BLAS = -L/opt/intel/composerxe-2011.4.191/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread

# even faster for VASP Kazushige Goto's BLAS
# http://www.cs.utexas.edu/users/kgoto/signup_first.html
# parallel goto version requires sometimes -libverbs
#BLAS= /home/alex/VASP/GotoBLAS2/libgoto2.so

# LAPACK, simplest use vasp.5.lib/lapack_double
LAPACK = ../vasp.5.lib/lapack_double.o

# use the mkl Intel lapack
#LAPACK= -lmkl_lapack

#-----------------------------------------------------------------------

LIB = -L../vasp.5.lib -ldmy \
../vasp.5.lib/linpack_double.o $(LAPACK) $(BLAS)

# options for linking, nothing is required (usually)
LINK =

#-----------------------------------------------------------------------
# fft libraries:
# VASP.5.2 can use fftw.3.1.X (http://www.fftw.org)
# since this version is faster on P4 machines, we recommend to use it
#-----------------------------------------------------------------------

#FFT3D = fft3dfurth.o fft3dlib.o

# alternatively: fftw.3.1.X is slighly faster and should be used if available
FFT3D = fftw3d.o fft3dlib.o /home/alex/VASP/fft/lib/libfftw3.a

INCS = -I/home/alex/VASP/fft/include

#=======================================================================
# MPI section, uncomment the following lines until
# general rules and compile lines
# presently we recommend OPENMPI, since it seems to offer better
# performance than lam or mpich
#
# !!! Please do not send me any queries on how to install MPI, I will
# certainly not answer them !!!!
#=======================================================================
#-----------------------------------------------------------------------
# fortran linker for mpi
#-----------------------------------------------------------------------

#FC=mpif77
#FCL=$(FC)

#-----------------------------------------------------------------------
# additional options for CPP in parallel version (see also above):
# NGZhalf charge density reduced in Z direction
# wNGZhalf gamma point only reduced in Z direction
# scaLAPACK use scaLAPACK (usually slower on 100 Mbit Net)
# avoidalloc avoid ALLOCATE if possible
# PGF90 work around some for some PGF90 / IFC bugs
# CACHE_SIZE 1000 for PII,PIII, 5000 for Athlon, 8000-12000 P4, PD
# RPROMU_DGEMV use DGEMV instead of DGEMM in RPRO (depends on used BLAS)
# RACCMU_DGEMV use DGEMV instead of DGEMM in RACC (depends on used BLAS)
# tbdyn MD package of Tomas Bucko
#-----------------------------------------------------------------------

#-----------------------------------------------------------------------

#CPP = $(CPP_) -DMPI -DHOST=\"LinuxIFC\" -DIFC \
# -DCACHE_SIZE=4000 -DPGF90 -Davoidalloc -DNGZhalf \
# -DMPI_BLOCK=8000
## -DRPROMU_DGEMV -DRACCMU_DGEMV

#-----------------------------------------------------------------------
# location of SCALAPACK
# if you do not use SCALAPACK simply leave that section commented out
#-----------------------------------------------------------------------

#BLACS=$(HOME)/archives/SCALAPACK/BLACS/
#SCA_=$(HOME)/archives/SCALAPACK/SCALAPACK

#SCA= $(SCA_)/libscalapack.a \
# $(BLACS)/LIB/blacsF77init_MPI-LINUX-0.a $(BLACS)/LIB/blacs_MPI-LINUX-0.a $(BLACS)/LIB/blacsF77init_MPI-LINUX-0.a

SCA=

#-----------------------------------------------------------------------
# libraries for mpi
#-----------------------------------------------------------------------

#LIB = -L../vasp.5.lib -ldmy \
# ../vasp.5.lib/linpack_double.o $(LAPACK) \
# $(SCA) $(BLAS)

# FFT: fftmpi.o with fft3dlib of Juergen Furthmueller
#FFT3D = fftmpi.o fftmpi_map.o fft3dfurth.o fft3dlib.o

# alternatively: fftw.3.1.X is slighly faster and should be used if available
#FFT3D = fftmpiw.o fftmpi_map.o fftw3d.o fft3dlib.o /opt/libs/fftw-3.1.2/lib/libfftw3.a

#-----------------------------------------------------------------------
# general rules and compile lines
#-----------------------------------------------------------------------
BASIC= symmetry.o symlib.o lattlib.o random.o

SOURCE= base.o mpi.o smart_allocate.o xml.o \
constant.o jacobi.o main_mpi.o scala.o \
asa.o lattice.o poscar.o ini.o mgrid.o xclib.o vdw_nl.o xclib_grad.o \
radial.o pseudo.o gridq.o ebs.o \
mkpoints.o wave.o wave_mpi.o wave_high.o \
$(BASIC) nonl.o nonlr.o nonl_high.o dfast.o choleski2.o \
mix.o hamil.o xcgrad.o xcspin.o potex1.o potex2.o \
constrmag.o cl_shift.o relativistic.o LDApU.o \
paw_base.o metagga.o egrad.o pawsym.o pawfock.o pawlhf.o rhfatm.o paw.o \
mkpoints_full.o charge.o Lebedev-Laikov.o stockholder.o dipol.o pot.o \
dos.o elf.o tet.o tetweight.o hamil_rot.o \
steep.o chain.o dyna.o sphpro.o us.o core_rel.o \
aedens.o wavpre.o wavpre_noio.o broyden.o \
dynbr.o rmm-diis.o reader.o writer.o tutor.o xml_writer.o \
brent.o stufak.o fileio.o opergrid.o stepver.o \
chgloc.o fast_aug.o fock.o mkpoints_change.o sym_grad.o \
mymath.o internals.o dynconstr.o dimer_heyden.o dvvtrajectory.o vdwforcefield.o \
hamil_high.o nmr.o pead.o mlwf.o subrot.o subrot_scf.o \
force.o pwlhf.o gw_model.o optreal.o davidson.o david_inner.o \
electron.o rot.o electron_all.o shm.o pardens.o paircorrection.o \
optics.o constr_cell_relax.o stm.o finite_diff.o elpol.o \
hamil_lr.o rmm-diis_lr.o subrot_cluster.o subrot_lr.o \
lr_helper.o hamil_lrf.o elinear_response.o ilinear_response.o \
linear_optics.o linear_response.o \
setlocalpp.o wannier.o electron_OEP.o electron_lhf.o twoelectron4o.o \
ratpol.o screened_2e.o wave_cacher.o chi_base.o wpot.o local_field.o \
ump2.o bse_te.o bse.o acfdt.o chi.o sydmat.o dmft.o \
rmm-diis_mlr.o linear_response_NMR.o

vasp: $(SOURCE) $(FFT3D) $(INC) main.o
rm -f vasp
$(FCL) -o vasp main.o $(SOURCE) $(FFT3D) $(LIB) $(LINK)
makeparam: $(SOURCE) $(FFT3D) makeparam.o main.F $(INC)
$(FCL) -o makeparam $(LINK) makeparam.o $(SOURCE) $(FFT3D) $(LIB)
zgemmtest: zgemmtest.o base.o random.o $(INC)
$(FCL) -o zgemmtest $(LINK) zgemmtest.o random.o base.o $(LIB)
dgemmtest: dgemmtest.o base.o random.o $(INC)
$(FCL) -o dgemmtest $(LINK) dgemmtest.o random.o base.o $(LIB)
ffttest: base.o smart_allocate.o mpi.o mgrid.o random.o ffttest.o $(FFT3D) $(INC)
$(FCL) -o ffttest $(LINK) ffttest.o mpi.o mgrid.o random.o smart_allocate.o base.o $(FFT3D) $(LIB)
kpoints: $(SOURCE) $(FFT3D) makekpoints.o main.F $(INC)
$(FCL) -o kpoints $(LINK) makekpoints.o $(SOURCE) $(FFT3D) $(LIB)

clean:
-rm -f *.g *.f90 *.f *.o *.L *.mod ; touch *.F

main.o: main$(SUFFIX)
$(FC) $(FFLAGS)$(DEBUG) $(INCS) -c main$(SUFFIX)
xcgrad.o: xcgrad$(SUFFIX)
$(FC) $(FFLAGS) $(INLINE) $(INCS) -c xcgrad$(SUFFIX)
xcspin.o: xcspin$(SUFFIX)
$(FC) $(FFLAGS) $(INLINE) $(INCS) -c xcspin$(SUFFIX)

makeparam.o: makeparam$(SUFFIX)
$(FC) $(FFLAGS)$(DEBUG) $(INCS) -c makeparam$(SUFFIX)

makeparam$(SUFFIX): makeparam.F main.F
#
# MIND: I do not have a full dependency list for the include
# and MODULES: here are only the minimal basic dependencies
# if one strucuture is changed then touch_dep must be called
# with the corresponding name of the structure
#
base.o: base.inc base.F
mgrid.o: mgrid.inc mgrid.F
constant.o: constant.inc constant.F
lattice.o: lattice.inc lattice.F
setex.o: setexm.inc setex.F
pseudo.o: pseudo.inc pseudo.F
poscar.o: poscar.inc poscar.F
mkpoints.o: mkpoints.inc mkpoints.F
wave.o: wave.F
nonl.o: nonl.inc nonl.F
nonlr.o: nonlr.inc nonlr.F

$(OBJ_HIGH):
$(CPP)
$(FC) $(FFLAGS) $(OFLAG_HIGH) $(INCS) -c $*$(SUFFIX)
$(OBJ_NOOPT):
$(CPP)
$(FC) $(FFLAGS) $(INCS) -c $*$(SUFFIX)

fft3dlib_f77.o: fft3dlib_f77.F
$(CPP)
$(F77) $(FFLAGS_F77) -c $*$(SUFFIX)

.F.o:
$(CPP)
$(FC) $(FFLAGS) $(OFLAG) $(INCS) -c $*$(SUFFIX)
.F$(SUFFIX):
$(CPP)
$(SUFFIX).o:
$(FC) $(FFLAGS) $(OFLAG) $(INCS) -c $*$(SUFFIX)

[ Edited Fri Aug 12 2011, 08:47PM ]

kvas · #5 Post by **kvas** » Fri Aug 12, 2011 6:25 pm

I'd like to show you the difference between parallel and serial version. I use "bench.Hg" benchmark.
For serial version top command will give you something like
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12020 alex 20 0 682m 74m 9304 R 767 0.6 3:02.01 vasp_serial

I use all 8 CPUs, this parallelization is provided by OS.
Timing for "bench.Hg":
Total CPU time used (sec): 420.450
User time (sec): 409.220
System time (sec): 11.230
Elapsed time (sec): 66.763

That means serial VASP needs 67 seconds to complete the benchmark.

For parallel version, run on 8 CPUs
mpirun -np 8 /path/to/vasp/vasp
top gives

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12090 alex 20 0 322m 60m 12m R 100 0.5 0:12.66 vasp
12092 alex 20 0 322m 60m 12m R 100 0.5 0:13.05 vasp
12094 alex 20 0 322m 59m 11m R 99 0.5 0:12.79 vasp
12089 alex 20 0 322m 60m 12m R 98 0.5 0:12.85 vasp
12091 alex 20 0 322m 60m 12m R 98 0.5 0:12.78 vasp
12093 alex 20 0 322m 59m 12m R 97 0.5 0:12.63 vasp
12088 alex 20 0 322m 60m 12m R 96 0.5 0:12.42 vasp
12095 alex 20 0 322m 60m 12m R 96 0.5 0:12.73 vasp

Timing for "bench.Hg" for parallel version:
Total CPU time used (sec): 18.470
User time (sec): 17.350
System time (sec): 1.120
Elapsed time (sec): 19.991

only 20 seconds, that means 3 times faster!!!

[ Edited Fri Aug 12 2011, 06:29PM ]

Marian Gall · #6 Post by **Marian Gall** » Sat Aug 25, 2012 9:26 pm

Hello,

i tried to compile VASP.5.2.12 with your instructions.
I got parallel and serial version of VASP binary without any a compilation's problems. But if i tried test of running it always crashed after a few seconds with error message:
Segmentation fault
Last line of OUTCAR was always a same:
----------------------------------------- Iteration 1( 1) ---------------------------------------

POTLOK: cpu time 2.16: real time 2.16
SETDIJ: cpu time 0.27: real time 0.27

I have no idea where must be problem. Can you help me?
Info about my system:
Intel(R) Xeon(R) CPU X5660, 2 cpu with 6 cores and 24 threads together.
MemTotal: 16455788 kB
Debian GNU/Linux 6.0

Marian Gall · #7 Post by **Marian Gall** » Sat Aug 25, 2012 9:30 pm

And this is my makefile for parallel version:

.SUFFIXES: .inc .f .f90 .F
SUFFIX=.f90

FC=/export/home/gall/openmpi_ifc/bin/mpif77
FCL=$(FC)

CPP_ = ./preprocess <$*.F | /usr/bin/cpp -P -C -traditional >$*$(SUFFIX)

CPP = $(CPP_) -DHOST=\"LinuxIFC\" \
-DCACHE_SIZE=12000 -DPGF90 -Davoidalloc -DNGXhalf \

FFLAGS = -FR -names lowercase -assume byterecl

OFLAG = -O2 -ip -ftz

OFLAG_HIGH = $(OFLAG)
OBJ_HIGH =
OBJ_NOOPT =
DEBUG = -FR -O0
INLINE = $(OFLAG)

BLAS= /export/home/gall/programs/VASP/5.2/GotoBLAS2/libgoto2.so
LAPACK= ../vasp.5.lib/lapack_double.o

LIB = -L../vasp.5.lib -ldmy \
../vasp.5.lib/linpack_double.o $(LAPACK) \
$(BLAS)

LINK =

CPP = $(CPP_) -DMPI -DHOST=\"LinuxIFC\" -DIFC \
-DCACHE_SIZE=4000 -DPGF90 -Davoidalloc -DNGZhalf \
-DMPI_BLOCK=8000

SCA=

FFT3D = fftmpiw.o fftmpi_map.o fftw3d.o fft3dlib.o /export/home/gall/programs/fftw/lib/libfftw3.a

INCS = -I/export/home/gall/programs/fftw/include

#-----------------------------------------------------------------------
# general rules and compile lines
#-----------------------------------------------------------------------
BASIC= symmetry.o symlib.o lattlib.o random.o

SOURCE= base.o mpi.o smart_allocate.o xml.o \
constant.o jacobi.o main_mpi.o scala.o \
asa.o lattice.o poscar.o ini.o mgrid.o xclib.o vdw_nl.o xclib_grad.o \
radial.o pseudo.o gridq.o ebs.o \
mkpoints.o wave.o wave_mpi.o wave_high.o \
$(BASIC) nonl.o nonlr.o nonl_high.o dfast.o choleski2.o \
mix.o hamil.o xcgrad.o xcspin.o potex1.o potex2.o \
constrmag.o cl_shift.o relativistic.o LDApU.o \
paw_base.o metagga.o egrad.o pawsym.o pawfock.o pawlhf.o rhfatm.o paw.o \
mkpoints_full.o charge.o Lebedev-Laikov.o stockholder.o dipol.o pot.o \
dos.o elf.o tet.o tetweight.o hamil_rot.o \
steep.o chain.o dyna.o sphpro.o us.o core_rel.o \
aedens.o wavpre.o wavpre_noio.o broyden.o \
dynbr.o rmm-diis.o reader.o writer.o tutor.o xml_writer.o \
brent.o stufak.o fileio.o opergrid.o stepver.o \
chgloc.o fast_aug.o fock.o mkpoints_change.o sym_grad.o \
mymath.o internals.o dynconstr.o dimer_heyden.o dvvtrajectory.o vdwforcefield.o \
hamil_high.o nmr.o pead.o mlwf.o subrot.o subrot_scf.o \
force.o pwlhf.o gw_model.o optreal.o davidson.o david_inner.o \
electron.o rot.o electron_all.o shm.o pardens.o paircorrection.o \
optics.o constr_cell_relax.o stm.o finite_diff.o elpol.o \
hamil_lr.o rmm-diis_lr.o subrot_cluster.o subrot_lr.o \
lr_helper.o hamil_lrf.o elinear_response.o ilinear_response.o \
linear_optics.o linear_response.o \
setlocalpp.o wannier.o electron_OEP.o electron_lhf.o twoelectron4o.o \
ratpol.o screened_2e.o wave_cacher.o chi_base.o wpot.o local_field.o \
ump2.o bse_te.o bse.o acfdt.o chi.o sydmat.o dmft.o \
rmm-diis_mlr.o linear_response_NMR.o

vasp: $(SOURCE) $(FFT3D) $(INC) main.o
rm -f vasp
$(FCL) -o vasp main.o $(SOURCE) $(FFT3D) $(LIB) $(LINK)
makeparam: $(SOURCE) $(FFT3D) makeparam.o main.F $(INC)
$(FCL) -o makeparam $(LINK) makeparam.o $(SOURCE) $(FFT3D) $(LIB)
zgemmtest: zgemmtest.o base.o random.o $(INC)
$(FCL) -o zgemmtest $(LINK) zgemmtest.o random.o base.o $(LIB)
dgemmtest: dgemmtest.o base.o random.o $(INC)
$(FCL) -o dgemmtest $(LINK) dgemmtest.o random.o base.o $(LIB)
ffttest: base.o smart_allocate.o mpi.o mgrid.o random.o ffttest.o $(FFT3D) $(INC)
$(FCL) -o ffttest $(LINK) ffttest.o mpi.o mgrid.o random.o smart_allocate.o base.o $(FFT3D) $(LIB)
kpoints: $(SOURCE) $(FFT3D) makekpoints.o main.F $(INC)
$(FCL) -o kpoints $(LINK) makekpoints.o $(SOURCE) $(FFT3D) $(LIB)

clean:
-rm -f *.g *.f *.o *.L *.mod ; touch *.F

main.o: main$(SUFFIX)
$(FC) $(FFLAGS)$(DEBUG) $(INCS) -c main$(SUFFIX)
xcgrad.o: xcgrad$(SUFFIX)
$(FC) $(FFLAGS) $(INLINE) $(INCS) -c xcgrad$(SUFFIX)
xcspin.o: xcspin$(SUFFIX)
$(FC) $(FFLAGS) $(INLINE) $(INCS) -c xcspin$(SUFFIX)

makeparam.o: makeparam$(SUFFIX)
$(FC) $(FFLAGS)$(DEBUG) $(INCS) -c makeparam$(SUFFIX)

makeparam$(SUFFIX): makeparam.F main.F

base.o: base.inc base.F
mgrid.o: mgrid.inc mgrid.F
constant.o: constant.inc constant.F
lattice.o: lattice.inc lattice.F
setex.o: setexm.inc setex.F
pseudo.o: pseudo.inc pseudo.F
poscar.o: poscar.inc poscar.F
mkpoints.o: mkpoints.inc mkpoints.F
wave.o: wave.F
nonl.o: nonl.inc nonl.F
nonlr.o: nonlr.inc nonlr.F

$(OBJ_HIGH):
$(CPP)
$(FC) $(FFLAGS) $(OFLAG_HIGH) $(INCS) -c $*$(SUFFIX)
$(OBJ_NOOPT):
$(CPP)
$(FC) $(FFLAGS) $(INCS) -c $*$(SUFFIX)

fft3dlib_f77.o: fft3dlib_f77.F
$(CPP)
$(F77) $(FFLAGS_F77) -c $*$(SUFFIX)

.F.o:
$(CPP)
$(FC) $(FFLAGS) $(OFLAG) $(INCS) -c $*$(SUFFIX)
.F$(SUFFIX):
$(CPP)
$(SUFFIX).o:
$(FC) $(FFLAGS) $(OFLAG) $(INCS) -c $*$(SUFFIX)
# special rules
#-----------------------------------------------------------------------

fft3dlib.o : fft3dlib.F
$(CPP)
$(FC) -FR -names lowercase -O2 -c $*$(SUFFIX)

fft3dfurth.o : fft3dfurth.F
$(CPP)
$(FC) -FR -names lowercase -O1 -c $*$(SUFFIX)

fftw3d.o : fftw3d.F
$(CPP)
$(FC) -FR -names lowercase -O1 $(INCS) -c $*$(SUFFIX)

wave_high.o : wave_high.F
$(CPP)
$(FC) -FR -names lowercase -O1 -c $*$(SUFFIX)

radial.o : radial.F
$(CPP)
$(FC) -FR -names lowercase -O1 -c $*$(SUFFIX)

symlib.o : symlib.F
$(CPP)
$(FC) -FR -names lowercase -O1 -c $*$(SUFFIX)

symmetry.o : symmetry.F
$(CPP)
$(FC) -FR -names lowercase -O1 -c $*$(SUFFIX)

wave_mpi.o : wave_mpi.F
$(CPP)
$(FC) -FR -names lowercase -O1 -c $*$(SUFFIX)

wave.o : wave.F
$(CPP)
$(FC) -FR -names lowercase -O1 -c $*$(SUFFIX)

dynbr.o : dynbr.F
$(CPP)
$(FC) -FR -names lowercase -O1 -c $*$(SUFFIX)

asa.o : asa.F
$(CPP)
$(FC) -FR -names lowercase -O1 -c $*$(SUFFIX)

broyden.o : broyden.F
$(CPP)
$(FC) -FR -names lowercase -O2 -c $*$(SUFFIX)

us.o : us.F
$(CPP)
$(FC) -FR -names lowercase -O1 -c $*$(SUFFIX)

LDApU.o : LDApU.F
$(CPP)
$(FC) -FR -names lowercase -O2 -c $*$(SUFFIX)

Marian Gall · #8 Post by **Marian Gall** » Mon Aug 27, 2012 2:38 pm

My problem with Segmentation fault is probably solved (i'm still testing it). I needed use following command line before run VASP executable:

ulimit -s unlimited

You can use it directly in command line, in VASP's running script or in your .bashrc file. Thank so much for your instalation instructions.