i can't run mpi job : rank 2 in job 1 abinitio01_57362 caused collective abort of all ranks
Posted: Sat Aug 29, 2009 12:25 am
my computer structure :
intel Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz
i have 4 computer, 3 nodes
fedora 11 X64
software:
intel fortran complier 11.1 intel64
intel C complier 11.1 intel64
intel MPI(3.2.1.009 intel64) MPICH2(1.0.7, 1.1.1p1) i had tried these version
VASP 4.6
VASP compile success
i can run job in single cpu
when i use the mpi , there are some error message
[b60507@abinitio01 5]$ mpdrun -machinefile ~/machine -n 12 vasp_parallel
running on 12 nodes
distr: one band on 12 nodes, 1 groups
vasp.4.6.31 08Feb07 complex
POSCAR found : 1 types and 6 ions
LDA part: xc-table for Ceperly-Alder, standard interpolation
found WAVECAR, reading the header
nup: number of bands has changed, file: 0 present: 16
POSCAR, INCAR and KPOINTS ok, starting setup
FFT: planning ... 2
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
vasp_parallel 000000000066F5B9 Unknown Unknown Unknown
vasp_parallel 0000000000651552 Unknown Unknown Unknown
vasp_parallel 0000000000651358 Unknown Unknown Unknown
vasp_parallel 000000000041D8D1 Unknown Unknown Unknown
vasp_parallel 000000000040790C Unknown Unknown Unknown
libc.so.6 000000376981EA2D Unknown Unknown Unknown
vasp_parallel 0000000000407809 Unknown Unknown Unknown
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
vasp_parallel 000000000066F5B9 Unknown Unknown Unknown
vasp_parallel 0000000000651552 Unknown Unknown Unknown
vasp_parallel 0000000000651358 Unknown Unknown Unknown
vasp_parallel 000000000041D8D1 Unknown Unknown Unknown
vasp_parallel 000000000040790C Unknown Unknown Unknown
libc.so.6 000000376981EA2D Unknown Unknown Unknown
vasp_parallel 0000000000407809 Unknown Unknown Unknown
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
vasp_parallel 000000000066F5B9 Unknown Unknown Unknown
vasp_parallel 0000000000651552 Unknown Unknown Unknown
vasp_parallel 0000000000651358 Unknown Unknown Unknown
vasp_parallel 000000000041D8D1 Unknown Unknown Unknown
vasp_parallel 000000000040790C Unknown Unknown Unknown
libc.so.6 000000376981EA2D Unknown Unknown Unknown
vasp_parallel 0000000000407809 Unknown Unknown Unknown
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
vasp_parallel 000000000066F5B9 Unknown Unknown Unknown
vasp_parallel 0000000000651552 Unknown Unknown Unknown
vasp_parallel 0000000000651358 Unknown Unknown Unknown
vasp_parallel 000000000041D8D1 Unknown Unknown Unknown
vasp_parallel 000000000040790C Unknown Unknown Unknown
libc.so.6 000000376981EA2D Unknown Unknown Unknown
vasp_parallel 0000000000407809 Unknown Unknown Unknown
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
vasp_parallel 000000000066F5B9 Unknown Unknown Unknown
vasp_parallel 0000000000651552 Unknown Unknown Unknown
vasp_parallel 0000000000651358 Unknown Unknown Unknown
vasp_parallel 000000000041D8D1 Unknown Unknown Unknown
vasp_parallel 000000000040790C Unknown Unknown Unknown
libc.so.6 000000376981EA2D Unknown Unknown Unknown
vasp_parallel 0000000000407809 Unknown Unknown Unknown
rank 9 in job 1 abinitio01_57362 caused collective abort of all ranks
exit status of rank 9: return code 174
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
vasp_parallel 000000000066F5B9 Unknown Unknown Unknown
vasp_parallel 0000000000651552 Unknown Unknown Unknown
vasp_parallel 0000000000651358 Unknown Unknown Unknown
vasp_parallel 000000000041D8D1 Unknown Unknown Unknown
vasp_parallel 000000000040790C Unknown Unknown Unknown
libc.so.6 000000376981EA2D Unknown Unknown Unknown
vasp_parallel 0000000000407809 Unknown Unknown Unknown
rank 7 in job 1 abinitio01_57362 caused collective abort of all ranks
exit status of rank 7: killed by signal 9
rank 2 in job 1 abinitio01_57362 caused collective abort of all ranks
exit status of rank 2: killed by signal 9
intel Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz
i have 4 computer, 3 nodes
fedora 11 X64
software:
intel fortran complier 11.1 intel64
intel C complier 11.1 intel64
intel MPI(3.2.1.009 intel64) MPICH2(1.0.7, 1.1.1p1) i had tried these version
VASP 4.6
VASP compile success
i can run job in single cpu
when i use the mpi , there are some error message
[b60507@abinitio01 5]$ mpdrun -machinefile ~/machine -n 12 vasp_parallel
running on 12 nodes
distr: one band on 12 nodes, 1 groups
vasp.4.6.31 08Feb07 complex
POSCAR found : 1 types and 6 ions
LDA part: xc-table for Ceperly-Alder, standard interpolation
found WAVECAR, reading the header
nup: number of bands has changed, file: 0 present: 16
POSCAR, INCAR and KPOINTS ok, starting setup
FFT: planning ... 2
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
vasp_parallel 000000000066F5B9 Unknown Unknown Unknown
vasp_parallel 0000000000651552 Unknown Unknown Unknown
vasp_parallel 0000000000651358 Unknown Unknown Unknown
vasp_parallel 000000000041D8D1 Unknown Unknown Unknown
vasp_parallel 000000000040790C Unknown Unknown Unknown
libc.so.6 000000376981EA2D Unknown Unknown Unknown
vasp_parallel 0000000000407809 Unknown Unknown Unknown
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
vasp_parallel 000000000066F5B9 Unknown Unknown Unknown
vasp_parallel 0000000000651552 Unknown Unknown Unknown
vasp_parallel 0000000000651358 Unknown Unknown Unknown
vasp_parallel 000000000041D8D1 Unknown Unknown Unknown
vasp_parallel 000000000040790C Unknown Unknown Unknown
libc.so.6 000000376981EA2D Unknown Unknown Unknown
vasp_parallel 0000000000407809 Unknown Unknown Unknown
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
vasp_parallel 000000000066F5B9 Unknown Unknown Unknown
vasp_parallel 0000000000651552 Unknown Unknown Unknown
vasp_parallel 0000000000651358 Unknown Unknown Unknown
vasp_parallel 000000000041D8D1 Unknown Unknown Unknown
vasp_parallel 000000000040790C Unknown Unknown Unknown
libc.so.6 000000376981EA2D Unknown Unknown Unknown
vasp_parallel 0000000000407809 Unknown Unknown Unknown
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
vasp_parallel 000000000066F5B9 Unknown Unknown Unknown
vasp_parallel 0000000000651552 Unknown Unknown Unknown
vasp_parallel 0000000000651358 Unknown Unknown Unknown
vasp_parallel 000000000041D8D1 Unknown Unknown Unknown
vasp_parallel 000000000040790C Unknown Unknown Unknown
libc.so.6 000000376981EA2D Unknown Unknown Unknown
vasp_parallel 0000000000407809 Unknown Unknown Unknown
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
vasp_parallel 000000000066F5B9 Unknown Unknown Unknown
vasp_parallel 0000000000651552 Unknown Unknown Unknown
vasp_parallel 0000000000651358 Unknown Unknown Unknown
vasp_parallel 000000000041D8D1 Unknown Unknown Unknown
vasp_parallel 000000000040790C Unknown Unknown Unknown
libc.so.6 000000376981EA2D Unknown Unknown Unknown
vasp_parallel 0000000000407809 Unknown Unknown Unknown
rank 9 in job 1 abinitio01_57362 caused collective abort of all ranks
exit status of rank 9: return code 174
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
vasp_parallel 000000000066F5B9 Unknown Unknown Unknown
vasp_parallel 0000000000651552 Unknown Unknown Unknown
vasp_parallel 0000000000651358 Unknown Unknown Unknown
vasp_parallel 000000000041D8D1 Unknown Unknown Unknown
vasp_parallel 000000000040790C Unknown Unknown Unknown
libc.so.6 000000376981EA2D Unknown Unknown Unknown
vasp_parallel 0000000000407809 Unknown Unknown Unknown
rank 7 in job 1 abinitio01_57362 caused collective abort of all ranks
exit status of rank 7: killed by signal 9
rank 2 in job 1 abinitio01_57362 caused collective abort of all ranks
exit status of rank 2: killed by signal 9