Questions regarding the compilation of VASP on various platforms: hardware, compilers and libraries, etc.
Moderators: Global Moderator, Moderator
-
Dankomaister
- Newbie
- Posts: 38
- Joined: Sat Feb 13, 2016 4:39 pm
- License Nr.: 20-0400 5-1605
#1
Post
by Dankomaister » Wed Feb 08, 2023 7:24 am
Hi,
I have a question regarding compiling VASP with vectorization support such as AVX2 or AVX512 in particular.
Are there any additional precompiler flags that needs to be added to fully utilize all vectorization code paths?
For example, I noticed that the file src/simd.inc has the following commented out section of code
Code: Select all
!!#if defined(__MIC__) || defined(__AVX512F__)
!!#define SIMD512
!!#undef SIMD256
!!#elif defined(__AVX__) || defined(__AVX2__)
!!#define SIMD256
!!#undef SIMD512
!!#endif
Does this mean that we manually have to set the SIMD256 or SIMD512 precompiler flags to fully utilize AVX2 or AVX512?
Any clarification on this would be helpful.
/Daniel
-
Dankomaister
- Newbie
- Posts: 38
- Joined: Sat Feb 13, 2016 4:39 pm
- License Nr.: 20-0400 5-1605
#2
Post
by Dankomaister » Mon Feb 13, 2023 2:05 am
Anyone have any insights on this?
-
fabien_tran1
- Global Moderator
- Posts: 419
- Joined: Mon Sep 13, 2021 11:02 am
#3
Post
by fabien_tran1 » Mon Feb 13, 2023 1:25 pm
Hi,
Sorry for the late answer. Yes, -DSIMD256 or -DSIMD512 needs to be added in makefile.include as an additional option (CPP_OPTIONS). However, the implementation (
https://doi.org/10.1002/qua.25851) is not supported and may be broken. Therefore, its use is not recommended.
-
Dankomaister
- Newbie
- Posts: 38
- Joined: Sat Feb 13, 2016 4:39 pm
- License Nr.: 20-0400 5-1605
#4
Post
by Dankomaister » Tue Feb 14, 2023 2:21 am
Great thanks for clearing this up!
I have some a few further question, is -DSIMD256 and -DSIMD512 only relevant when compiling VASP with OpenMP?
Or will it also benefit the pure MPI version? and are there any plans to support this vectorization in the future?
/Daniel
-
fabien_tran1
- Global Moderator
- Posts: 419
- Joined: Mon Sep 13, 2021 11:02 am
#5
Post
by fabien_tran1 » Tue Feb 14, 2023 8:38 am
Yes, SIMD works only in conjunction with OpenMP. At the moment, no decision has been made about the future of SIMD.
Concerning the use of SIMD, I should elaborate a bit more. -DSIMD256 and -DSIMD512 activate SIMD in xclib_grad.F for GGA functionals (91,AM,B3,B5,BO,MK,ML,OR,PE,PS,RE,RP) in non-spin polarized case only (no implementation of SIMD in spin-polarized GGA and also not for meta-GGA functionals). According to a (quick) test I have just made -DSIMD256 seems to work (the results are correct) for all aforementioned GGA functionals. Thus, maybe it is ok to use the SIMD option, but preferably with prior tests calculations to check the correctness of the results and the gain in speed.
-
Dankomaister
- Newbie
- Posts: 38
- Joined: Sat Feb 13, 2016 4:39 pm
- License Nr.: 20-0400 5-1605
#6
Post
by Dankomaister » Tue Feb 14, 2023 12:51 pm
Ok so I need to compile with OpenMP, the reason I asked is because we have found that for almost all systems/calculations it is faster to run the pure MPI version of VASP because using optimal values of NCORE is always faster than using NCORE=1 and MPI+OpenMP. I assume using more than one OpenMP thread per MPI task is required to unlock the SIMD optimizations? or can I compile with OpenMP and then only use 1 thread per MPI task so that it is possible to use higher values of NCORE and SIMD optimizations?
The paper you linked shows a 9x speedup when using SIMD which is huge, if that is indeed true it would be beneficial to use MPI+OpenMP with SIMD over the pure MPI version. Maybe that could warrant official support for SIMD? and justify an implementation of SIMD for spin polarized GGA, up to 9x speedup would certainly be useful.
-
fabien_tran1
- Global Moderator
- Posts: 419
- Joined: Mon Sep 13, 2021 11:02 am
#7
Post
by fabien_tran1 » Tue Feb 14, 2023 2:54 pm
No, it is not necessary to set OMP_NUM_THREADS to a value larger than 1 to have SIMD activated (I have just checked it). Note that the speedup shown in the paper is for the time spent in the GGAALL_GRID subroutine, and not for the total time. I will test more carefully the SIMD implementation and write something on the VASP manual if finally we think it is safe to use it. Yes, sure it would be good to have SIMD implemented in other parts of the code, and we will think about it.
-
fabien_tran1
- Global Moderator
- Posts: 419
- Joined: Mon Sep 13, 2021 11:02 am
#8
Post
by fabien_tran1 » Wed Feb 22, 2023 11:13 am
Update: The implementation of SIMD for range-separated hybrid functionals like HSE has a bug (wrong results). The bug can be fixed by deleting or commenting the following line in xclib_grad.F:
INIT_PRED = .FALSE.
For the GGA functionals (91,AM,B3,B5,BO,MK,ML,OR,PE,PS,RE,RP) there was no problem.