VASP-6.4.3 NCCL

Problems running VASP: crashes, internal errors, "wrong" results.


Moderators: Global Moderator, Moderator

Post Reply
Message
Author
vladimir.ladygin
Newbie
Newbie
Posts: 3
Joined: Wed Jan 20, 2021 1:02 pm

VASP-6.4.3 NCCL

#1 Post by vladimir.ladygin » Tue Feb 25, 2025 1:15 am

Dear VASP Developers,

I've got an official bug trying to use one core per one gpu nccl setup on NERSC Perlmutter. This is just an ordinary relaxation calculation.

"""internal error in: mpi.F at line: 903

M_init_nccl: Error in ncclCommInitRank

If you are not a developer, you should not encounter this problem.
Please submit a bug report.

"""

Kind Regards,
Vladimir

You do not have the required permissions to view the files attached to this post.

ferenc_karsai
Global Moderator
Global Moderator
Posts: 530
Joined: Mon Nov 04, 2019 12:44 pm

Re: VASP-6.4.3 NCCL

#2 Post by ferenc_karsai » Tue Feb 25, 2025 9:48 am

Thanks for the report, I will try to reproduce the error on our machines.


ferenc_karsai
Global Moderator
Global Moderator
Posts: 530
Joined: Mon Nov 04, 2019 12:44 pm

Re: VASP-6.4.3 NCCL

#3 Post by ferenc_karsai » Wed Feb 26, 2025 12:30 pm

I talked to a colleague and he observed a similar bug before from a user on the forum.

Here is the originale post:
https://www.vasp.at/forum/viewtopic.php?t=19822

The solution for the moment is that you don't use NCCL, so compile without -DUSENCCL in the makefile.include.


Post Reply