VASP 4.6 Parallel Hangs at Run Time
Posted: Fri Dec 15, 2006 3:20 pm
I have successfully compiled VASP for serial and parallel use. I can run serial jobs without any problem, but parallel jobs launch, then immediately hang. I compiled the parallel version with the following components:
RedHat Linux ELAS 4.0, update 4 (64-bit)
VASP 4.6
Portland pgf90 6.0-8 64-bit
mpich2-1.0.4p1
fftw-3.1.2
GotoBLAS-1.09
I have a job that runs fine using the serial version. When I launch the same job using the parallel version, vasp starts and then hangs (meaning that it doesn't use any memory or CPU). There are no error messages in the output file or on the screen. In fact, there are no messages what so ever. This is making it very hard to debug.
Here is the command that I use to launch MPICH2:
mpdboot -n 6 -f ../mpd.hosts
Here is the command that I use to launch parallel vasp:
mpiexec -machinefile freenodes -n 2 /home/jess/NewVaspSRC/vasp.4.6-parallel/3d-debug/vasp.4.6/vasp < /home/jess/SakuraVASP/POSCAR >/home/jess/SakuraVASP/jess_output
Here is the state of the MPICH2 and vasp programs:
jess 6851 0.0 0.3 87144 7692 ? S 10:14 0:00 python2.3 /opt/mpich2/bin/mpd.py --ncpus=1 -e -d
jess 6863 0.0 0.3 86156 6848 pts/3 S 10:15 0:00 python2.3 /opt/mpich2/bin/mpiexec -machinefile freenodes -n 2 /home/j
jess 6864 0.0 0.3 87148 7700 ? S 10:15 0:00 python2.3 /opt/mpich2/bin/mpd.py --ncpus=1 -e -d
jess 6865 0.0 0.3 87148 7700 ? S 10:15 0:00 python2.3 /opt/mpich2/bin/mpd.py --ncpus=1 -e -d
jess 6866 0.0 0.0 20504 1016 ? S 10:15 0:00 /home/jess/NewVaspSRC/vasp.4.6-parallel/3d-debug/vasp.4.6/vasp
jess 6867 0.0 0.0 20504 1092 ? S 10:15 0:00 /home/jess/NewVaspSRC/vasp.4.6-parallel/3d-debug/vasp.4.6/vasp
The output file is empty, even after I kill the job:
-rw-r--r-- 1 jess users 0 Dec 15 10:15 jess_output
I have tried recompiling the parallel vasp with debugging options, though I still don't get any messages. Here are the debugging settings that I added to vasp's Makefile:
FFLAGS = -Mfree -tp k8-64 -i8 -C -g
# Under the MPI section
CPP = $(CPP_) -DMPI -DHOST=\"LinuxIFC\" -DIFC \
-Dkind8 -DNGZhalf -DCACHE_SIZE=4000 -DPGF90 -Davoidalloc \
-DMPI_BLOCK=500 \
-DRPROMU_DGEMV -DRACCMU_DGEMV -Ddebug
Does anyone know of additional debugging/verbose options that I can set so vasp will display any type of message?
RedHat Linux ELAS 4.0, update 4 (64-bit)
VASP 4.6
Portland pgf90 6.0-8 64-bit
mpich2-1.0.4p1
fftw-3.1.2
GotoBLAS-1.09
I have a job that runs fine using the serial version. When I launch the same job using the parallel version, vasp starts and then hangs (meaning that it doesn't use any memory or CPU). There are no error messages in the output file or on the screen. In fact, there are no messages what so ever. This is making it very hard to debug.
Here is the command that I use to launch MPICH2:
mpdboot -n 6 -f ../mpd.hosts
Here is the command that I use to launch parallel vasp:
mpiexec -machinefile freenodes -n 2 /home/jess/NewVaspSRC/vasp.4.6-parallel/3d-debug/vasp.4.6/vasp < /home/jess/SakuraVASP/POSCAR >/home/jess/SakuraVASP/jess_output
Here is the state of the MPICH2 and vasp programs:
jess 6851 0.0 0.3 87144 7692 ? S 10:14 0:00 python2.3 /opt/mpich2/bin/mpd.py --ncpus=1 -e -d
jess 6863 0.0 0.3 86156 6848 pts/3 S 10:15 0:00 python2.3 /opt/mpich2/bin/mpiexec -machinefile freenodes -n 2 /home/j
jess 6864 0.0 0.3 87148 7700 ? S 10:15 0:00 python2.3 /opt/mpich2/bin/mpd.py --ncpus=1 -e -d
jess 6865 0.0 0.3 87148 7700 ? S 10:15 0:00 python2.3 /opt/mpich2/bin/mpd.py --ncpus=1 -e -d
jess 6866 0.0 0.0 20504 1016 ? S 10:15 0:00 /home/jess/NewVaspSRC/vasp.4.6-parallel/3d-debug/vasp.4.6/vasp
jess 6867 0.0 0.0 20504 1092 ? S 10:15 0:00 /home/jess/NewVaspSRC/vasp.4.6-parallel/3d-debug/vasp.4.6/vasp
The output file is empty, even after I kill the job:
-rw-r--r-- 1 jess users 0 Dec 15 10:15 jess_output
I have tried recompiling the parallel vasp with debugging options, though I still don't get any messages. Here are the debugging settings that I added to vasp's Makefile:
FFLAGS = -Mfree -tp k8-64 -i8 -C -g
# Under the MPI section
CPP = $(CPP_) -DMPI -DHOST=\"LinuxIFC\" -DIFC \
-Dkind8 -DNGZhalf -DCACHE_SIZE=4000 -DPGF90 -Davoidalloc \
-DMPI_BLOCK=500 \
-DRPROMU_DGEMV -DRACCMU_DGEMV -Ddebug
Does anyone know of additional debugging/verbose options that I can set so vasp will display any type of message?