VASP on multiple nodes each with multiple GPUs
Posted: Fri Sep 22, 2023 1:02 pm
Hi,
Does anyone have a SLURM job script to run VASP with GPUs on multiple nodes?
In my script with no particular settings related to CUDA or NVHPC or OpenACC or NCCL,
I get a good scaling for VASP from 1 up to 8 GPUs but within one node. Running on two nodes(i.e. 16 GPUs) is slower than one node (8 GPUs). However, I find benchmarks of VASP GPU on nvidia page up to many nodes for a system of about 700 atoms.
My system has about 500 atoms, therefore, I would expect to obtain speedup up to a few nodes at least.
The HPC cluster has InfiniBand.
I have also compared running on two GPUs in two ways, (i) both GPUs on one node, (ii) two nodes each with one GPU. The latter is about 20% slower.
I wonder whether I need a particular setting to run on more than one node?
Thank you in advance!
Alireza
Does anyone have a SLURM job script to run VASP with GPUs on multiple nodes?
In my script with no particular settings related to CUDA or NVHPC or OpenACC or NCCL,
I get a good scaling for VASP from 1 up to 8 GPUs but within one node. Running on two nodes(i.e. 16 GPUs) is slower than one node (8 GPUs). However, I find benchmarks of VASP GPU on nvidia page up to many nodes for a system of about 700 atoms.
My system has about 500 atoms, therefore, I would expect to obtain speedup up to a few nodes at least.
The HPC cluster has InfiniBand.
I have also compared running on two GPUs in two ways, (i) both GPUs on one node, (ii) two nodes each with one GPU. The latter is about 20% slower.
I wonder whether I need a particular setting to run on more than one node?
Thank you in advance!
Alireza