Guidance on Optimizing Precompiler Options for Performance (MPI_BLOCK and CACHE_SIZE).
Posted: Fri Apr 19, 2024 11:46 pm
Dear VASP Forum Members,
I am reaching out to seek your valuable insights and recommendations on correctly setting the precompiler options for optimal performance in VASP simulations. Specifically, I am interested in understanding the best practices for configuring the -DMPI_BLOCK and -DCACHE_SIZE options in relation to system resources such as CPU cache size, the number of processor cores, and overall memory size.
Background Information: Our current computational node includes a Linux-based system with the following characteristics:
- CPU: 2 * AMD EPYC 9554 with 64 cores
- Memory: 24 * 32 GB (4800 MT/s RECC)
- Toolchain: Intel OneAPI 2023.2.0
Below is the current precompiler options suggested by the default makefile.include.intel:
Points of Inquiry:
1. MPI_BLOCK: Given our system specifications, how should we adjust the -DMPI_BLOCK=8000 setting? Is there a rule of thumb or a formula to calculate the ideal block size related to the number of cores or the specific characteristics of MPI-based communication?
2. CACHE_SIZE: Can you recommend how to set the -DCACHE_SIZE=4000 option in relation to the CPU's cache size? How does modifying this parameter affect the performance, and what considerations should we keep in mind to balance between computational efficiency and memory usage?
Thank you in advance for your time and help. I look forward to your valuable suggestions.
Best regards,
Zhao
I am reaching out to seek your valuable insights and recommendations on correctly setting the precompiler options for optimal performance in VASP simulations. Specifically, I am interested in understanding the best practices for configuring the -DMPI_BLOCK and -DCACHE_SIZE options in relation to system resources such as CPU cache size, the number of processor cores, and overall memory size.
Background Information: Our current computational node includes a Linux-based system with the following characteristics:
- CPU: 2 * AMD EPYC 9554 with 64 cores
- Memory: 24 * 32 GB (4800 MT/s RECC)
- Toolchain: Intel OneAPI 2023.2.0
Below is the current precompiler options suggested by the default makefile.include.intel:
Code: Select all
# Default precompiler options
CPP_OPTIONS = -DHOST=\"LinuxIFC\" \
-DMPI -DMPI_BLOCK=8000 -Duse_collective \
-DscaLAPACK \
-DCACHE_SIZE=4000 \
-Davoidalloc \
-Dvasp6 \
-Duse_bse_te \
-Dtbdyn \
-Dfock_dblbuf
1. MPI_BLOCK: Given our system specifications, how should we adjust the -DMPI_BLOCK=8000 setting? Is there a rule of thumb or a formula to calculate the ideal block size related to the number of cores or the specific characteristics of MPI-based communication?
2. CACHE_SIZE: Can you recommend how to set the -DCACHE_SIZE=4000 option in relation to the CPU's cache size? How does modifying this parameter affect the performance, and what considerations should we keep in mind to balance between computational efficiency and memory usage?
Thank you in advance for your time and help. I look forward to your valuable suggestions.
Best regards,
Zhao