Hi VASP support,
Unfortunately my version of VASP (6.4.3), running on a SLURM cluster is stopping in the middle of an aiMD job without producing any error messages.
I have compiled VASP using oneapi 2022.3 on Red Hat 8.1. VASP also successfully completes its testsuite.
The simulation simply stops, seemingly at random. The last lines of the OSZICAR are:
Code: Select all
1786 T= 1036. E= -.23538280E+04 F= -.27677929E+04 E0= -.27677929E+04 EK= 0.11378E+02 SP= 0.40E+03 SK= 0.70E-01
1787 T= 1033. E= -.23538280E+04 F= -.27679653E+04 E0= -.27679653E+04 EK= 0.11354E+02 SP= 0.40E+03 SK= 0.76E-01
1788 T= 1030. E= -.23538280E+04 F= -.27681294E+04 E0= -.27681294E+04 EK= 0.11314E+02 SP= 0.40E+03 SK= 0.82E-01
1789 T= 1024. E= -.23538279E+04 F= -.27682767E+04 E0= -.27682767E+04 EK= 0.11251E+02 SP= 0.40E+03 SK= 0.87E-01
1790 T= 1016. E= -.23538276E+04 F= -.27683977E+04 E0= -.27683977E+04 EK= 0.11158E+02 SP= 0.40E+03 SK= 0.90E-01
N E dE d eps ncg rms rms(c)
DAV: 1 -0.273911763763E+04 0.12358E+01 -0.84342E+02 3104 0.434E+01
DAV: 2 -0.274133701322E+04 -0.22194E+01 -0.22171E+01 3784 0.556E+00
DAV: 3 -0.274139721292E+04 -0.60200E-01 -0.60192E-01 4608 0.812E-01
DAV: 4 -0.274139949566E+04 -0.22827E-02 -0.22827E-02 3976 0.148E-01 0.107E+01
The INCAR is:
Code: Select all
PREC = normal #precision
ISIF = 2 #stress tensor and dof
ISYM = 0 #no symmetry is used
EDIFF = 1e-4 #tolerance of selectronic sc loop
NELM = 60 #maximum electron sc steps
NELMIN = 4 #minimum e sc steps
ALGO = N #optimisation algo
MAXMIX = 40
NSIM = 4 #number of bands that are optimised paralell
LPLANE = T
LSCALU = F #wether to use scalapack decompsition
NWRITE = 1 #amount of info in outcar
LREAL = Auto
NBLOCK = 1 #how many steps until DOS etc is calculated
KBLOCK = 20
APACO = 20.00 #cutoff for PC function
ISMEAR = -1 #smearing of partially occupied orbitals
IBRION = 0 #MD
SMASS = 0 #Nose thermostat
SIGMA = 0.2064 #width of smearing
TEBEG = 1000 #starting temp
TEEND = 1100 #ending temp
NSW = 80000 #no of steps
POTIM = 0.4 #timesteps in fs
BMIX = 0.63
NCORE = 10
ML_LMLFF = .TRUE.
ML_MODE = train
KPOINTS:
Code: Select all
KPOINTS
0
Auto
20
And finally my slurm submission file is:
Code: Select all
#!/bin/bash
#SBATCH --job-name='ML ZrC 1000K'
#SBATCH --partition=compute
#SBATCH --time=120:00:00
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=10
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=4G
#SBATCH --account=<my account here>
#SBATCH --error=error.log
export PATH="/home/pdwurzner/software/vasp.6.4.3/bin:$PATH"
srun vasp_std
The error.log file is empty. The only information I have is from SLURM itself, which says the process crashed with an exit code of 134.
Also worth noting: I am running an MLFF simulation with roughly 85 atoms and about 160GB of RAM.
I have tried to recompile VASP several times (also different versions) but the problem persists. How would you recommend I proceed?
Kind regards,
Philip