Very simple query on hybrid parallelization

Message

askhetan · #1 Post by **askhetan** » Wed Jun 01, 2016 10:19 pm

Does the ScGW method work with the explicit openmp + mpi parallelization ? Hoping for a quick response from the develpers. This might help me overcome the massive memory requirements for some loss in performance.

#2 Post by **admin** » Thu Jun 02, 2016 3:05 pm

Hybrid parallelization (HP) of the GW code is under development.
In the current version the HP is not working.
Hint#1: Memory sharing can be tuned by precompiler-flags
-Duse_shmem NCSHMEM = integer (integer .le. cores-per-node, or integer .eq. cores-per-socket)
Hint#2: One can perform memory-demanding calculations in a supercomputing center.
Many of them in Germany have the vasp installed.

askhetan · #3 Post by **askhetan** » Thu Jun 02, 2016 4:05 pm

Thank you for the reply. I am already on the JURECA system at Juelich in Germany, on which I installed vasp with the help of the support from the admins. The partially self consistent GW calculations for the Slab systems (50 atoms, 108 occupied bands, 288 total bands, encutgw = 200 eV) that I want to calculate require > 50GB per core when using 288 mpi tasks when not using any kpoint parallelization and lspectral=.true. As you know already, lspectral=.false. is very slow and it doesnt get me anywhere in 24 hours for this system.

Yesterday I anyway compiled the hybrid version of vasp with the flags FC=mpif90 -openmp -openmp-report2. With lspectral=.ture. It is already performing updates of the chi_q(r,r) and seems to be running in hybrid mode. I will have to see if the results match from those ofpure mpi jobs, for some smaller pure mpi calculable systems.

askhetan · #4 Post by **askhetan** » Mon Jun 13, 2016 2:25 pm

1) So, there is no difference between running vasp with hybrid because there is no line in the source code which has OpenMP active. I also tested that implicit OpenMP parallelization of the VASP code with which was compiled with the intel mkl library. There was still no difference in the the performance between 4 pure mpi processes.and 4 mpi processes with 6 MKL_NUM_THREADS (over a total of 24 physical cores). The GW method performs a lot of BLAS operations during its course of operations but I still did not find any effect of using (as recommended on the intel mkl page https://software.intel.com/en-us/articl ... lications/ )

export MKL_NUM_THREADS=1
export MKL_DOMAIN_NUM_THREADS="MKL_BLAS=6"
export OMP_NUM_THREADS=1
export MKL_DYNAMIC="TRUE"

It would be very kind if the VASP developers could throw some light on this. Even if I do not compile the code for explicit hybrid parallelization, the intel mkl routines must still work, right ? Any ideas how could I modify my compilation to make this work?

2) In my compilation I use
CPP_OPTIONS= -DMPI -DHOST=\"IFC91_ompi\" -DIFC \
-DCACHE_SIZE=4000 -DPGF90 -Davoidalloc \
-DMPI_BLOCK=65536 -DscaLAPACK -Duse_collective \
-DnoAugXCmeta -Duse_bse_te \
-Duse_shmem -Dtbdyn -DVASP2WANNIER90

As you can see the -Duse_shmem flag is already there. Did you mean another was of specifying it ? I tried specifying the following as my system has 24 cores per node
-Duse_shmem 24
-Duse_shmem NCSHMEM = 24

However, I always get the error while compilation:
--------------------------------------------------------------------------------------
fpp: fatal: Usage: fpp [-flags]... [filein [fileout]]
gmake[2]: *** [base.f90] Error 1
gmake[2]: Leaving directory `/homea/jhpc36/jhpc3601/software/VASP/vasp/vasp.5.4.1_wannier1.2_hybrid/build/std'
cp: cannot stat ‘vasp’: No such file or directory
gmake[1]: *** [all] Error 1
--------------------------------------------------------------------------------------
May be I am specifying with the wrong syntax. It would be very kind if you could please answer these two points and give some hints.

Thanks and Best Regards
-ask

My Community

Very simple query on hybrid parallelization

Very simple query on hybrid parallelization

Re: Very simple query on hybrid parallelization

Re: Very simple query on hybrid parallelization

Re: Very simple query on hybrid parallelization