VASP 6.4.3 - multi-node GPU performance

Queries about input and output files, running specific calculations, etc.


Moderators: Global Moderator, Moderator

Post Reply
Message
Author
leszek_nowakowski
Newbie
Newbie
Posts: 5
Joined: Fri Mar 15, 2024 10:35 am

VASP 6.4.3 - multi-node GPU performance

#1 Post by leszek_nowakowski » Mon Feb 17, 2025 2:06 pm

Hello,
There were reports about performance of multi-node GPU VASP calculations (https://p.vasp.at/forum/viewtopic.php?t=19178, https://ww.vasp.at/forum/viewtopic.php?p=20145), but they didn't acutally solve the issue of VASP performance on multi-nodes.
As my experiments confirmed, the drop of performance is very significant (actaully, calulations are even 10-20 times slower then on a 1 node!)

When ran at 1 node, VASP works perfectly and each process is bind to one gpu:

Code: Select all

nvidia-smi

Code: Select all

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A   1264221      C   ...ftware/VASP/vasp.6.4.3/bin/vasp_std      37926MiB |
|    1   N/A  N/A   1264222      C   ...ftware/VASP/vasp.6.4.3/bin/vasp_std      38084MiB |
|    2   N/A  N/A   1264223      C   ...ftware/VASP/vasp.6.4.3/bin/vasp_std      38084MiB |
|    3   N/A  N/A   1264224      C   ...ftware/VASP/vasp.6.4.3/bin/vasp_std      37876MiB |
+-----------------------------------------------------------------------------------------+

But when ran on two nodes, it seems like all 4 processes are bound to GPU0, and 3 more a bound to consecutive GPUS (on one node, the same situtation happens on other node):

Code: Select all

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A    976211      C   ...ftware/VASP/vasp.6.4.3/bin/vasp_std       9070MiB |
|    0   N/A  N/A    976212      C   ...ftware/VASP/vasp.6.4.3/bin/vasp_std        556MiB |
|    0   N/A  N/A    976213      C   ...ftware/VASP/vasp.6.4.3/bin/vasp_std        556MiB |
|    0   N/A  N/A    976214      C   ...ftware/VASP/vasp.6.4.3/bin/vasp_std        556MiB |
|    1   N/A  N/A    976212      C   ...ftware/VASP/vasp.6.4.3/bin/vasp_std       8966MiB |
|    2   N/A  N/A    976213      C   ...ftware/VASP/vasp.6.4.3/bin/vasp_std       8974MiB |
|    3   N/A  N/A    976214      C   ...ftware/VASP/vasp.6.4.3/bin/vasp_std       8846MiB |
+-----------------------------------------------------------------------------------------+

Additionaly, even when ran on 1 node, VASP spans some additional threads (despite OMP_NUM_THREADS=1):

Code: Select all

htop

Code: Select all

CPU NLWP     PID USER       PRI  NI  VIRT   RES   SHR S  CPU%▽MEM%   TIME+  Command
  0   13 1265538 plglnowako  20   0 45.9G 3975M  582M R  99.8  0.5  1:15.57 /net/home/plgrid/plglnowakowski/software/VASP/vasp.6.4.3/bin/vasp_std
 72   13 1265539 plglnowako  20   0 46.4G 3968M  576M R  99.8  0.5  1:14.28 /net/home/plgrid/plglnowakowski/software/VASP/vasp.6.4.3/bin/vasp_std
216   13 1265541 plglnowako  20   0 45.9G 3733M  576M R  96.7  0.4  1:14.35 /net/home/plgrid/plglnowakowski/software/VASP/vasp.6.4.3/bin/vasp_std
144   13 1265540 plglnowako  20   0 46.4G 3987M  576M R  96.0  0.5  1:14.17 /net/home/plgrid/plglnowakowski/software/VASP/vasp.6.4.3/bin/vasp_std

Is it a result of specific task-gpu binding done by OpenMPI, maybe wrong SLURM parameters? Can You reproduce such a situation? What is the performance of VASP in Your multi-node GPU calculations?
Cluster: 4 Nvidia GH200 gpus on one node
Toolchains and libraries: NVHPC/24.5, CUDA 12.4.0, OpenMPi 5.0.3
VASP version: 6.4.3

Relevant files are attached.

Best Regards,
Leszek

You do not have the required permissions to view the files attached to this post.

manuel_engel1
Global Moderator
Global Moderator
Posts: 168
Joined: Mon May 08, 2023 4:08 pm

Re: VASP 6.4.3 - multi-node GPU performance

#2 Post by manuel_engel1 » Tue Feb 18, 2025 4:04 pm

Hello Leszek,

Thanks for reaching out and for providing such a detailed report. Even still, it's quite hard to tell what is really going on. Such issues can sometimes depend on hardware or dependencies such as the MPI library. Unfortunately, we cannot easily assess multi-node GPU performance on our side.

First, I would like you to test setting KPAR=2 or even up to the number of GPUs. This should greatly increase performance as it decreases the communication between processes (however, it also increases memory demand, so it depends also a bit on your hardware capabilities).

Regarding the additional threads, you require one MPI rank per GPU, so I would expect to see 4 processes running on one node. Could you confirm that the extra threads are indeed not just these 4 MPI ranks? This would be expected behavior. It's entirely possible that the additional processes on one node have to do with communication. They seem to share the same process ID (PID). It's not entirely clear to me however.

Let me know if setting KPAR changes anything.

Kind regards

Manuel
VASP developer

Post Reply