My Community

Posted: **Tue Jun 30, 2009 3:12 pm**

Hi,

we compiled VASP 5.2 using Intel Fortran Compiler 11.0 (83), and openmpi 1.3.2 for parallelization. We tried acml and mkl on an AMD Opteron cluster (see Makefile below). Independent of the used math libraries, we observe the following:

For the test we used an example job which runs fine with vasp4.6.
The job finishes successfully when Algo=VeryFast and 8 cpus are used.
The same job crashes if Algo=Fast is used.
The same jobs crashes, if 4 or 1 cpu are used instead of 8, no matter if Fast or VeryFast are chosen.

We found the code line where VASP hangs. The error occurs upon calling HAMILTMU(...) in either davidson.F (Fast) or rmm-diis.F (VeryFast) for the first time.

While we can't check if the jobs works with 8 cpus on Intel nodes, we can reproduce that it crashes when calling HAMILTMU on one (four) Intel 2 Quad cpu(s) (using the same Compiler, openmpi, and the Intel MKL).

VASP does also crash here when compiled in the serial version.

After all these tests, I hope this behavior is worth reporting here. Does somebody know a way out?

Thank you for your help!
Ferdinand

--------------------------------------------------
Makefile: (only FC, BLAS, LAPACK were adjusted)
--------------------------------------------------
SUFFIX=.f90

CPP_ = ./preprocess <$*.F | /usr/bin/cpp -P -C -traditional >$*$(SUFFIX)

FFLAGS = -FR -lowercase -assume byterecl

OFLAG=-O3
OFLAG_HIGH = $(OFLAG)
OBJ_HIGH =
OBJ_NOOPT =
DEBUG = -FR -O0
INLINE = $(OFLAG)

BLAS=/home/ferdl/Software/Libs/acml-4-0-1-ifort-64bit/ifort64/lib/libacml.so
LAPACK=/home/ferdl/Software/Libs/acml-4-0-1-ifort-64bit/ifort64/lib/libacml.so

LIB = -L../vasp.5.lib -ldmy \
../vasp.5.lib/linpack_double.o $(LAPACK) \
$(BLAS)

FC=/home/ferdl/Software/MPI/openmpi-1.3.2-Intel_FC11/bin/mpif90

CPP = $(CPP_) -DMPI -DHOST=\"LinuxIFC\" \
-Dkind8 -DCACHE_SIZE=4000 -DPGF90 -Davoidalloc -DNGZhalf \
-DMPI_BLOCK=8000

FFT3D = fftmpi.o fftmpi_map.o fft3dfurth.o fft3dlib.o

BASIC= symmetry.o symlib.o lattlib.o random.o

SOURCE= base.o mpi.o smart_allocate.o xml.o \
constant.o jacobi.o main_mpi.o scala.o \
asa.o lattice.o poscar.o ini.o xclib.o xclib_grad.o \
radial.o pseudo.o mgrid.o gridq.o ebs.o \
mkpoints.o wave.o wave_mpi.o wave_high.o \
$(BASIC) nonl.o nonlr.o nonl_high.o dfast.o choleski2.o \
mix.o hamil.o xcgrad.o xcspin.o potex1.o potex2.o \
metagga.o constrmag.o cl_shift.o relativistic.o LDApU.o \
paw_base.o egrad.o pawsym.o pawfock.o pawlhf.o paw.o \
mkpoints_full.o charge.o dipol.o pot.o \
dos.o elf.o tet.o tetweight.o hamil_rot.o \
steep.o chain.o dyna.o sphpro.o us.o core_rel.o \
aedens.o wavpre.o wavpre_noio.o broyden.o \
dynbr.o rmm-diis.o reader.o writer.o tutor.o xml_writer.o \
brent.o stufak.o fileio.o opergrid.o stepver.o \
chgloc.o fast_aug.o fock.o mkpoints_change.o sym_grad.o \
mymath.o internals.o dimer_heyden.o dvvtrajectory.o vdwforcefield.o \
hamil_high.o nmr.o force.o \
pead.o subrot.o subrot_scf.o pwlhf.o gw_model.o optreal.o davidson.o \
electron.o rot.o electron_all.o shm.o pardens.o paircorrection.o \
optics.o constr_cell_relax.o stm.o finite_diff.o elpol.o \
hamil_lr.o rmm-diis_lr.o subrot_cluster.o subrot_lr.o \
lr_helper.o hamil_lrf.o elinear_response.o ilinear_response.o \
linear_optics.o linear_response.o \
setlocalpp.o wannier.o electron_OEP.o electron_lhf.o twoelectron4o.o \
ratpol.o screened_2e.o wave_cacher.o chi_base.o wpot.o local_field.o \
ump2.o bse.o acfdt.o chi.o sydmat.o

INC=

vasp: $(SOURCE) $(FFT3D) $(INC) main.o
rm -f vasp
$(FCL) -o vasp main.o $(SOURCE) $(FFT3D) $(LIB) $(LINK)
makeparam: $(SOURCE) $(FFT3D) makeparam.o main.F $(INC)
$(FCL) -o makeparam $(LINK) makeparam.o $(SOURCE) $(FFT3D) $(LIB)
zgemmtest: zgemmtest.o base.o random.o $(INC)
$(FCL) -o zgemmtest $(LINK) zgemmtest.o random.o base.o $(LIB)
dgemmtest: dgemmtest.o base.o random.o $(INC)
$(FCL) -o dgemmtest $(LINK) dgemmtest.o random.o base.o $(LIB)
ffttest: base.o smart_allocate.o mpi.o mgrid.o random.o ffttest.o $(FFT3D) $(INC)
$(FCL) -o ffttest $(LINK) ffttest.o mpi.o mgrid.o random.o smart_allocate.o base.o $(FFT3D) $(LIB)
kpoints: $(SOURCE) $(FFT3D) makekpoints.o main.F $(INC)
$(FCL) -o kpoints $(LINK) makekpoints.o $(SOURCE) $(FFT3D) $(LIB)

clean:
-rm -f *.g *.f *.o *.L *.mod ; touch *.F

main.o: main$(SUFFIX)
$(FC) $(FFLAGS)$(DEBUG) $(INCS) -c main$(SUFFIX)
xcgrad.o: xcgrad$(SUFFIX)
$(FC) $(FFLAGS) $(INLINE) $(INCS) -c xcgrad$(SUFFIX)
xcspin.o: xcspin$(SUFFIX)
$(FC) $(FFLAGS) $(INLINE) $(INCS) -c xcspin$(SUFFIX)

makeparam.o: makeparam$(SUFFIX)
$(FC) $(FFLAGS)$(DEBUG) $(INCS) -c makeparam$(SUFFIX)

makeparam$(SUFFIX): makeparam.F main.F

base.o: base.inc base.F
mgrid.o: mgrid.inc mgrid.F
constant.o: constant.inc constant.F
lattice.o: lattice.inc lattice.F
setex.o: setexm.inc setex.F
pseudo.o: pseudo.inc pseudo.F
poscar.o: poscar.inc poscar.F
mkpoints.o: mkpoints.inc mkpoints.F
wave.o: wave.inc wave.F
nonl.o: nonl.inc nonl.F
nonlr.o: nonlr.inc nonlr.F

$(OBJ_HIGH):
$(CPP)
$(FC) $(FFLAGS) $(OFLAG_HIGH) $(INCS) -c $*$(SUFFIX)
$(OBJ_NOOPT):
$(CPP)
$(FC) $(FFLAGS) $(INCS) -c $*$(SUFFIX)

fft3dlib_f77.o: fft3dlib_f77.F
$(CPP)
$(F77) $(FFLAGS_F77) -c $*$(SUFFIX)

.F.o:
$(CPP)
$(FC) $(FFLAGS) $(OFLAG) $(INCS) -c $*$(SUFFIX)
.F$(SUFFIX):
$(CPP)
$(SUFFIX).o:
$(FC) $(FFLAGS) $(OFLAG) $(INCS) -c $*$(SUFFIX)

fft3dlib.o : fft3dlib.F
$(CPP)
$(FC) -FR -lowercase -O2 -c $*$(SUFFIX)
fft3dfurth.o : fft3dfurth.F
$(CPP)
$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)

radial.o : radial.F
$(CPP)
$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)

symlib.o : symlib.F
$(CPP)
$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)

symmetry.o : symmetry.F
$(CPP)
$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)

wave_mpi.o : wave_mpi.F
$(CPP)
$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)

wave.o : wave.F
$(CPP)
$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)

dynbr.o : dynbr.F
$(CPP)
$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)

asa.o : asa.F
$(CPP)
$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)

broyden.o : broyden.F
$(CPP)
$(FC) -FR -lowercase -O2 -c $*$(SUFFIX)

us.o : us.F
$(CPP)
$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)

LDApU.o : LDApU.F
$(CPP)
$(FC) -FR -lowercase -O2 -c $*$(SUFFIX)

Posted: **Wed Jul 01, 2009 4:34 pm**

I reported a similar problem June 6) - no solution has been provided yet. I also tried a manifold of combinations compiler/compiler-options/library etc on my cluster (Altix SE 1300) - none worked.

I suspect your sample problem is LARGE (in my case 512 atoms, total size 3 GB). Then you can trigger the crash by changing encut or some other parameters (real space projection), which increase the size of fields that are passed in HAMILTMU (though I don't think the full fields are passed ....). This dependency is also the reason why it doesn't crash with 8 procs (smaller fields), but does so with 4 procs ....

By the way, when referring to vasp.4.6, do you refer to 4.6.36 (with rewritten interfaces to F77-code) or 4.6.31 (with "old" interfaces)?

Overall, it's puzzling: using Intel10 on a (slightly) different platform I have no such problem, while the error is persisting on my Altix, no matter whether ifort9, ifort 10 or 11 is used .... ..

Posted: **Thu Jul 02, 2009 1:56 pm**

Hi,

I have read your report (admittedly, after posting mine). I can confirm that using ifort9.0 doesn't solve the problem.

The system contains 44 atoms and a lot of vacuum. I can confirm that a very small system (Au bulk) finishes without problems. Reducing ENCUT from 273.894 to 73.894eV prevented the crash at this point as you suspected.

Concerning vasp 4.6, I was talking about 4.6.28.

Posted: **Mon Jul 06, 2009 6:24 pm**

the question remains, as to whether this is a problem of the compiler (ifort 11), a problem of the OS (Suse 10), or of hardware (Intel E5420@2.50GHz) --- or of all.
Since it works on other platforms, there's no principal fault in the code ...

Any suggestions?

Posted: **Sun Jul 12, 2009 12:57 am**

two steps to consider:
i. increase the stacksize "ulimit -s unlimited" in your .bashrc or "limit stacksize umlimited" in your .cshrc
ii. think about compiling the code using compiler options "-mcmodel=medium" [check the PG user guide]
(also check the amount of memory vasp allocates on the root node. I'm up to 1.9 GB in some cases, havn't seen more in a running application yet)

i and ii help some (!) of my problems.
I'm currently stuck with a problem when compiling the serial version in routine asa.F
" internal ERROR: SETYLM, insufficient INDMAX workspace"
checking the sizes:
UBOUND(YLM,1) 924
INDMAX 822
but how the heck did this occur ?

Posted: **Fri Jul 17, 2009 5:29 am**

Unfortunately, this didn't help. I tried "-mcmodel=medium (-i-dynamic [-i8])" without success.

Also, I tried to edit paw.F as suggested in "Vasp 5.2.2 Keeps crashing free(): invalid next size (fast)". To compile successfully, I had to insert the line

"ALLOCATE
(CTMP(LMDIM,LMDIM,MAX(2,WDES%NCDIJ)),CSO(LMDIM,LMDIM,WDES%NCDIJ), CHF(LMDIM,LMDIM,WDES%NCDIJ))"

instead of

"ALLOCATE
(CTMP(LMDIM,LMDIM,MAX(2,WDES%NCDIJ)),CSO(LMDIM,LMDIM,WDES%NCDIJ),CHF(CTMP(LMDIM,LMDIM,MAX(2,WDES%NCD
IJ)))" .

It didn't help either.

Posted: **Fri Jul 17, 2009 7:55 am**

This may be of little or no relevance but I've successfully compiled Vasp5.2.2 on several clusters using ifort 11.0, openmpi 1.3.2 and mkl 10.0. In all cases I first encountered seg. faults at runtime (for large systems) that were connected to the stacksize. Many of the sysadmins recommended to use "ulimit -s unlimited" in my .bashrc file. This did however not work. It was later realized that this was due to that the processes spawned by the the specific queingsystem in fact inherit the system default stacksize independently of what I've specified in my .bashrc. At one cluster I got a work around from the sysadmins that explicitly set the stacksize limit and vasp ran just fine after that.

At the other clusters it was recommended to put all arrays at the heap instead of the stack, so after compiling with the "-heap-arrays" option to ifort, vasp ran just fine. I also used this option at the first cluster where I had my work around and compared the performance between the two approaches. I only found a very small increase in computational demand when using the "-heap-arrays" option, which i definitely can live with.

I've also experienced problems using the O3 optimization together with ifort 11 at some clusters, which manifested themselves in strange convergence behaviour and wrong energies. I've haven't figured out which file that is overoptimized yet (if any) but using O1 instead seems to eliminate the issues.

Cheers,
/Dan

<span class='smallblacktext'>[ Edited Fri Jul 17 2009, 11:27AM ]</span>

Posted: **Fri Jul 17, 2009 4:27 pm**

[quote="forsdan"]This may be of little or no relevance but I've successfully compiled Vasp5.2.2 on several clusters using ifort 11.0, openmpi 1.3.2 and mkl 10.0. In all cases I first encountered seg. faults at runtime (for large systems) that were connected to the stacksize. Many of the sysadmins recommended to use "ulimit -s unlimited" in my .bashrc file. This did however not work. It was later realized that this was due to that the processes spawned by the the specific queingsystem in fact inherit the system default stacksize independently of what I've specified in my .bashrc. At one cluster I got a work around from the sysadmins that explicitly set the stacksize limit and vasp ran just fine after that.
[/quote]

It should be better to change stacksize limit from vasp.

cat > limit.c
#include <sys/time.h>
#include <sys/resource.h>
#include <stdio.h>
void stacksize_()
{
int res;
struct rlimit rlim;

getrlimit(RLIMIT_STACK, &rlim);
printf("Before: cur=%d,hard=%d\n",(int)rlim.rlim_cur,(int)rlim.rlim_max);

rlim.rlim_cur=RLIM_INFINITY;
rlim.rlim_max=RLIM_INFINITY;
res=setrlimit(RLIMIT_STACK, &rlim);

getrlimit(RLIMIT_STACK, &rlim);
printf("After: res=%d,cur=%d,hard=%d\n",res,(int)rlim.rlim_cur,(int)rlim.rlim_max);
}

In main.F add at the beginning
CALL stacksize();

And in the makefile add limit.o at the end of the variable SOURCE
and add

limit.o: limit.c
cc -c -Wall -O2 limit.c

Warning, it seems the makefile doesn't automatically create main.f90 from main.F

Posted: **Fri Jul 17, 2009 4:59 pm**

The work around I obtained was actually constructed in a similar way. I just pointed out that the seg. fault issues in my case could also easily be solved by putting the arrays on the heap. The only disadvantage I've been told is that there could be a slight decrease in the computational performance, but otherwise there shouldn't be any problems. But I guess your approach is much better since it actually deals with the problem and just not circumvent it.

Best regards,
/Dan
<span class='smallblacktext'>[ Edited Fri Jul 17 2009, 07:00PM ]</span>

Posted: **Tue Jul 21, 2009 5:04 am**

I tried the (simpler) way and used "-heap-arrays". VASP works fine now!

Thank you a lot, forum!

My Community

VASP 5.2 crashes conditionally upon calling HAMILTMU(...)

VASP 5.2 crashes conditionally upon calling HAMILTMU(...)

VASP 5.2 crashes conditionally upon calling HAMILTMU(...)

VASP 5.2 crashes conditionally upon calling HAMILTMU(...)

VASP 5.2 crashes conditionally upon calling HAMILTMU(...)

VASP 5.2 crashes conditionally upon calling HAMILTMU(...)

VASP 5.2 crashes conditionally upon calling HAMILTMU(...)

VASP 5.2 crashes conditionally upon calling HAMILTMU(...)

VASP 5.2 crashes conditionally upon calling HAMILTMU(...)

VASP 5.2 crashes conditionally upon calling HAMILTMU(...)

VASP 5.2 crashes conditionally upon calling HAMILTMU(...)