vasp not detecting GPU
Moderators: Global Moderator, Moderator
-
- Newbie
- Posts: 13
- Joined: Tue Sep 13, 2022 12:30 pm
vasp not detecting GPU
Hi,
I'm trying to setup VASP on a HPC facility with multiple nodes, each having 8 GPUs (Tesla M60). Due to the availability of the NVHPC modules (CUDA 11.7) the only possible version, which I am able to compile without errors is 6.3.0.
Running VASP using a slurm script (attached in a zip file) initially produces a series of UCX warnings (visible in the initial lines of the attached slurm-301066.out file. In the following the code seems to run uneventfully, albeit very slow. In addition, no message about the presence of GPU is published in the first few output lines. This leads me to believe that the GPUs are not detected by VASP.
The same VASP input set of files (INCAR, POSCAR...) finishes fine and reasonably fast on a local machine equipped with a single RTX 4090 GPU and running vasp 6.4.3.
I have also included a makefile.include, made from a template makefile.include.nvhpc_acc. I have also tried to use a template makefile.include.nvhpc_omp_ac with the same results.
Regards.
Gvido
I'm trying to setup VASP on a HPC facility with multiple nodes, each having 8 GPUs (Tesla M60). Due to the availability of the NVHPC modules (CUDA 11.7) the only possible version, which I am able to compile without errors is 6.3.0.
Running VASP using a slurm script (attached in a zip file) initially produces a series of UCX warnings (visible in the initial lines of the attached slurm-301066.out file. In the following the code seems to run uneventfully, albeit very slow. In addition, no message about the presence of GPU is published in the first few output lines. This leads me to believe that the GPUs are not detected by VASP.
The same VASP input set of files (INCAR, POSCAR...) finishes fine and reasonably fast on a local machine equipped with a single RTX 4090 GPU and running vasp 6.4.3.
I have also included a makefile.include, made from a template makefile.include.nvhpc_acc. I have also tried to use a template makefile.include.nvhpc_omp_ac with the same results.
Regards.
Gvido
-
- Global Moderator
- Posts: 109
- Joined: Tue Oct 17, 2023 10:17 am
Re: vasp not detecting GPU
Dear Gvido,
you seem to have forgotten to attach your attachments...
What version of NVIDIA HPC SDK are you using? You can always download the newest version directly from NVIDIA, which would maybe help you to get the most recent version (6.4.3) running.
Cheers, Michael
you seem to have forgotten to attach your attachments...
What version of NVIDIA HPC SDK are you using? You can always download the newest version directly from NVIDIA, which would maybe help you to get the most recent version (6.4.3) running.
Cheers, Michael
-
- Newbie
- Posts: 13
- Joined: Tue Sep 13, 2022 12:30 pm
Re: vasp not detecting GPU
Ah, sorry, fingers faster than the brain. Attachment included.
The HPC facility uses NVHPC/22.7-CUDA-11.7.0. Upgrading to the latest version is beyond my permissions (everything runs in modules). I'll talk to their system administrator.
Regrads.
Gvido
The HPC facility uses NVHPC/22.7-CUDA-11.7.0. Upgrading to the latest version is beyond my permissions (everything runs in modules). I'll talk to their system administrator.
Regrads.
Gvido
You do not have the required permissions to view the files attached to this post.
-
- Global Moderator
- Posts: 109
- Joined: Tue Oct 17, 2023 10:17 am
Re: vasp not detecting GPU
Dear Gvido,
The good news is that running HPV-SDK 22.7 (>21.2) and CUDA 11.7 (>10.0) are ok for the openACC port of VASP. You should be able to compile the current version (6.4.3).
The bad news is that your GPUs seem to be a bit slow for double precision (FP64), which is what is important for VASP. According to a link I found single precision performance is pretty high for a ~10-year-old card at 4.8 TFLOPS, but double precision is only 1/32 of that, at 150 GFLOPS. Of course, FP64 performance should be lower than FP32, but ideally only by a factor of 2, like for the Tesla P100 which is only 1 year younger than the M60.
You should still be able to compile the code for the M60s however, and check if you get any speedup compared to running on CPU only.
Note that the compute capability of the M60 is only 5.2, so your makefile.include has to be adapted to generate code for that compute ability:
Note that I based these lines on the makefile.include.nvhpc_acc of version 6.4.1, which I think you should be able to build. This is why, compared to your makefile.include, there is the CC line there as well. Note that you can also build for more than one compute capability (cc) at a time, so there is the option to keep the cc60,cc70,cc80 as well, but the M60 needs the cc52 for sure.
If you are willing to spend time on something that probably will show very weak performance, try to compile 6.4.3 with the appropriately modified makefile.include taken from the arch folder and get back to me if you still have troubles.
Cheers, Michael
The good news is that running HPV-SDK 22.7 (>21.2) and CUDA 11.7 (>10.0) are ok for the openACC port of VASP. You should be able to compile the current version (6.4.3).
The bad news is that your GPUs seem to be a bit slow for double precision (FP64), which is what is important for VASP. According to a link I found single precision performance is pretty high for a ~10-year-old card at 4.8 TFLOPS, but double precision is only 1/32 of that, at 150 GFLOPS. Of course, FP64 performance should be lower than FP32, but ideally only by a factor of 2, like for the Tesla P100 which is only 1 year younger than the M60.
You should still be able to compile the code for the M60s however, and check if you get any speedup compared to running on CPU only.
Note that the compute capability of the M60 is only 5.2, so your makefile.include has to be adapted to generate code for that compute ability:
Code: Select all
CC = mpicc -acc -gpu=cc52,cuda11.7
FC = mpif90 -acc -gpu=cc52,cuda11.7
FCL = mpif90 -acc -gpu=cc52,cuda11.7 -c++libs
If you are willing to spend time on something that probably will show very weak performance, try to compile 6.4.3 with the appropriately modified makefile.include taken from the arch folder and get back to me if you still have troubles.
Cheers, Michael
-
- Newbie
- Posts: 13
- Joined: Tue Sep 13, 2022 12:30 pm
Re: vasp not detecting GPU
Thanks, Michael.
I'll get on to it today, and let you know.
Regards.
Gvido
I'll get on to it today, and let you know.
Regards.
Gvido
-
- Global Moderator
- Posts: 109
- Joined: Tue Oct 17, 2023 10:17 am
Re: vasp not detecting GPU
Dear Gvido,
please also get back to me if it works with the new makefile.include setting, so I can close the thread.
Good luck, Michael
please also get back to me if it works with the new makefile.include setting, so I can close the thread.
Good luck, Michael
-
- Newbie
- Posts: 13
- Joined: Tue Sep 13, 2022 12:30 pm
Re: vasp not detecting GPU
Hi Michael,
the compiler complains about cc:
Compile for compute capability X.Y; supported values: 35,50,60,61,62,70,72,75,80,86.
What do you suggest to take? 50?.
the compiler complains about cc:
Compile for compute capability X.Y; supported values: 35,50,60,61,62,70,72,75,80,86.
What do you suggest to take? 50?.
-
- Global Moderator
- Posts: 109
- Joined: Tue Oct 17, 2023 10:17 am
Re: vasp not detecting GPU
Dear Gvido,
I really should have tried this out. Sorry. Yes, the maximal compute capability is 5.2, so if the compiler either supports 5.0 or 6.0, you should use 5.0.
Thus:
Let me know if this works,
Michael
I really should have tried this out. Sorry. Yes, the maximal compute capability is 5.2, so if the compiler either supports 5.0 or 6.0, you should use 5.0.
Thus:
Code: Select all
CC = mpicc -acc -gpu=cc50,cuda11.7
FC = mpif90 -acc -gpu=cc50,cuda11.7
FCL = mpif90 -acc -gpu=cc50,cuda11.7 -c++libs
Michael
-
- Newbie
- Posts: 13
- Joined: Tue Sep 13, 2022 12:30 pm
Re: vasp not detecting GPU
Please ignore this. I had a wrong compiler module loaded. Sorry.
Regards.
Gvido
Regards.
Gvido
-
- Global Moderator
- Posts: 109
- Joined: Tue Oct 17, 2023 10:17 am
Re: vasp not detecting GPU
Hi Gvido,
I did now successfully compile vasp 6.4.3 with cc=50 and cuda12.3, using Nvidia HPC SDK version 24.1 and /arch/makefile.include.nvhpc_ompi_mkl_omp_acc
Quickly running a couple of tests from the fast test suite on a single A30 GPU, was successful, so I think you should get this to work on your M60s.
However, the performance will be bad on this hardware, as I alluded to earlier.
Please let me know if I can delete this post, I am not sure what this refers to since I also had the compiler problem with cc=52 and had to drop down to cc=50:
Michael
I did now successfully compile vasp 6.4.3 with cc=50 and cuda12.3, using Nvidia HPC SDK version 24.1 and /arch/makefile.include.nvhpc_ompi_mkl_omp_acc
Quickly running a couple of tests from the fast test suite on a single A30 GPU, was successful, so I think you should get this to work on your M60s.
However, the performance will be bad on this hardware, as I alluded to earlier.
Please let me know if I can delete this post, I am not sure what this refers to since I also had the compiler problem with cc=52 and had to drop down to cc=50:
Please let me know if you succeed, or have any other issues,Please ignore this. I had a wrong compiler module loaded. Sorry.
Regards.
Gvido
Michael
-
- Newbie
- Posts: 13
- Joined: Tue Sep 13, 2022 12:30 pm
Re: vasp not detecting GPU
Hi Michael,
The compilation of the 6.4.3. went fine using the attached makefile.include. The queue on the HPC facility is packed, so that I can't update you on the speed of calculation. I'll let you know as soon as it runs through, but you can close this ticket. Thank you so much for your help.
Regards.
Gvido
The compilation of the 6.4.3. went fine using the attached makefile.include. The queue on the HPC facility is packed, so that I can't update you on the speed of calculation. I'll let you know as soon as it runs through, but you can close this ticket. Thank you so much for your help.
Regards.
Gvido
You do not have the required permissions to view the files attached to this post.