MPI problems on BG/L
Posted: Thu Nov 01, 2007 3:39 am
Same problem as outlined in my earlier post, but a little more information in case it means something to somebody
The NANs seem to all be write statements that follow a sum over the different nodes using a call to M_sum_d in mpi.F
An additional symptom is that if I try to get direct information out of mpi.F by, for example, defining NPAR=3 and using 32 processors (which should print out a message about 32 not being divisible by 3) it does indeed crash the program but with "killed by exit on node 31" rather than the error message defined in mpi.F. This is all by-the-way - but if anyone has had similar symptoms, I would love to hear about it
The NANs seem to all be write statements that follow a sum over the different nodes using a call to M_sum_d in mpi.F
An additional symptom is that if I try to get direct information out of mpi.F by, for example, defining NPAR=3 and using 32 processors (which should print out a message about 32 not being divisible by 3) it does indeed crash the program but with "killed by exit on node 31" rather than the error message defined in mpi.F. This is all by-the-way - but if anyone has had similar symptoms, I would love to hear about it