VASP on multiple nodes each with multiple GPUs

Questions regarding the compilation of VASP on various platforms: hardware, compilers and libraries, etc.


Moderators: Global Moderator, Moderator

Post Reply
Message
Author
ghasemi
Newbie
Newbie
Posts: 5
Joined: Thu Feb 02, 2023 11:27 am

VASP on multiple nodes each with multiple GPUs

#1 Post by ghasemi » Fri Sep 22, 2023 1:02 pm

Hi,

Does anyone have a SLURM job script to run VASP with GPUs on multiple nodes?

In my script with no particular settings related to CUDA or NVHPC or OpenACC or NCCL,
I get a good scaling for VASP from 1 up to 8 GPUs but within one node. Running on two nodes(i.e. 16 GPUs) is slower than one node (8 GPUs). However, I find benchmarks of VASP GPU on nvidia page up to many nodes for a system of about 700 atoms.
My system has about 500 atoms, therefore, I would expect to obtain speedup up to a few nodes at least.
The HPC cluster has InfiniBand.

I have also compared running on two GPUs in two ways, (i) both GPUs on one node, (ii) two nodes each with one GPU. The latter is about 20% slower.

I wonder whether I need a particular setting to run on more than one node?

Thank you in advance!
Alireza

alexey.tal
Global Moderator
Global Moderator
Posts: 314
Joined: Mon Sep 13, 2021 12:45 pm

Re: VASP on multiple nodes each with multiple GPUs

#2 Post by alexey.tal » Mon Nov 20, 2023 3:21 pm

Dear Alireza,
Does anyone have a SLURM job script to run VASP with GPUs on multiple nodes?
When setting up a slurm job for running your calculation on GPUs, it is important to choose the number of tasks per node to be equal to the number of GPUs per node. This way you would be able to benefit from the asynchronous communication enabled by the NCCL library.

To better understand what types of calculations and tests you have done, I would need more information. Could you please provide the input and output files for your calculations (see guidelines). Also, it would be helpful if you could attach your makefile.include, so that I can see what toolchains and libraries you are using.

Post Reply