Hello,
I am using Athena gpu clusters with the details attached to this massage. Would you please help me prepare a good incar tags and slurm file? I need to do HSE calculations for a supercell of 160 atom including 3d atoms.
Best regards,
Asiyeh
help for run on GPU
Moderators: Global Moderator, Moderator
-
- Newbie
- Posts: 34
- Joined: Wed Aug 03, 2022 10:42 am
help for run on GPU
You do not have the required permissions to view the files attached to this post.
-
- Administrator
- Posts: 282
- Joined: Mon Sep 24, 2018 9:39 am
Re: help for run on GPU
Dear Asiyeh,
On these AMD CPUs the NUMA nodes are numbered backwards and only 2 NUMA nodes of each socket have a direct GPU connection, so the best setup is always a trade-off between sacrificing NUMA locality vs. GPU-locality. Please find a script attached that is often used on DGX A100, which should (hopefully) work on Athena as well.
On a slurm cluster, the script can be used as follows:
On these AMD CPUs the NUMA nodes are numbered backwards and only 2 NUMA nodes of each socket have a direct GPU connection, so the best setup is always a trade-off between sacrificing NUMA locality vs. GPU-locality. Please find a script attached that is often used on DGX A100, which should (hopefully) work on Athena as well.
Code: Select all
#!/usr/bin/env bash
# this is the list of GPUs we have
GPUS=(0 1 2 3 4 5 6 7)
NICS=(mlx5_0 mlx5_1 mlx5_2 mlx5_3 mlx5_6 mlx5_7 mlx5_8 mlx5_9)
CPUS=(3 2 1 0 7 6 5 4)
cores=(48-63 32-47 16-31 0-15 112-127 96-111 80-95 64-79)
export OMP_NUM_THREADS=16
export OMP_DYNAMIC="FALSE"
export OMP_PLACES='sockets'
export OMP_PROC_BIND='close'
ulimit -s unlimited
ulimit -c unlimited
APP="$EXE $ARGS"
#lrank=$OMPI_COMM_WORLD_LOCAL_RANK
lrank=$SLURM_LOCALID
export UCX_NET_DEVICES=${NICS[$lrank]}:1
export OMPI_MCA_btl_openib_if_include=${NICS[$lrank]}
export CUDA_VISIBLE_DEVICES=${GPUS[$lrank]}
export LOCAL_RANK=$lrank
export GLOBAL_RANK=$SLURM_PROCID
if [[ $GLOBAL_RANK -lt 8 ]]; then echo "local rank $lrank: using hca $UCX_NET_DEVICES openib using $OMPI_MCA_btl_openib_if_include"; fi
numactl -C ${cores[$lrank]} --membind=${CPUS[$lrank]} $APP
Code: Select all
export EXE=/path/to/vasp_std
srun --cpu-bind=none ./cvd_set.sh