Page 1 of 1

Mining an old OUTCAR for ML_AB data?

Posted: Fri Mar 17, 2023 10:18 pm
by victor_robinson
Dear all,

This is not clear to me still: Can we use previous AIMD runs (OUTCAR's from MD without ML) to then convert the data into ML_AB or 'rerun' vasp over this OUTCAR to create a ML_AB or even ML_FF?

From what I have read I don't think VASP can do this, but perhaps there is a script that mines data to create a ML_AB?

Thanks and regards, Victor

Re: Mining an old OUTCAR for ML_AB data?

Posted: Wed Mar 22, 2023 12:55 pm
by andreas.singraber
Dear Victor,

unfortunately we do not yet provide a tool to convert a single or series of OUTCAR files to an ML_AB training structure database. This is on our agenda but I cannot give you a time horizon for delivery. However, it should be possible to write such a converter yourself without much effort. Take an existing ML_AB file as reference, e.g., from the VASP testsuite you could have a look at testsuite/tests/ML_LiF_CaO_ISTART1/ML_ABN.ref which contains mixed-type structures with up to four elements. Here are a few important steps to consider in order to create a valid ML_AB file:
  1. The ML_AB file starts with a header providing general information, e.g. about the types and maximum number of atoms,... Either extract this information from the OUTCAR file (search for VRHFIN, ions per type, etc.) or set up this part manually.
  2. Afterwards the section starting with

    Code: Select all

    The numbers of basis sets per atom type
    usually contains the local reference configurations for each type which were selected during on-the-fly training. Because we cannot know from the data in the OUTCAR file which atoms should go there, you need to add a dummy section only listing a single atom, e.g. like this:

    Code: Select all

    ...
    **************************************************
         The numbers of basis sets per atom type
    --------------------------------------------------
            1    1    1
            1
    **************************************************
         Basis set for Li
    --------------------------------------------------
              1      1
    **************************************************
         Basis set for F
    --------------------------------------------------
              1     1
    **************************************************
         Basis set for Ca
    --------------------------------------------------
              1      1
    **************************************************
         Basis set for O
    --------------------------------------------------
              1     1
              ....
    
  3. Then follows the list of all configurations, always starting with

    Code: Select all

    **************************************************
         Configuration num.      ???
    ==================================================
    
    You can get the lattice, position, energy, force and stress data from the OUTCAR file if you look for these lines:

    Code: Select all

    direct lattice vectors                                                     ---> lattice
    POSITION                                       TOTAL-FORCE (eV/Angst)      ---> positions and forces
    free  energy   TOTEN  =                                                    ---> energy
    in kB                                                                      ---> stress
    
  4. In some ML_AB files each configuration contains a section like this:

    Code: Select all

    ==================================================                                                       
         CTIFOR
    --------------------------------------------------
       1.0000000000000001E-016
    ==================================================
    
    You can safely omit this section, it is not required for this purpose.
Finally, after you created an ML_AB file from your OUTCAR data, you need to perform a special mode of training where local reference configurations are selected. This can be done by setting ML_MODE=select (equal to ML_ISTART=3, NSW=1) in your INCAR file.

Hope this helps you if you attempt to write a script yourself.

All the best,
Andreas Singraber

Re: Mining an old OUTCAR for ML_AB data?

Posted: Fri Mar 24, 2023 9:16 pm
by victor_robinson
Thanks for the informative response. I agree, it would be good to be able to loop over old OUTCAR data once that is available. I may give this a go until then.
Victor

Re: Mining an old OUTCAR for ML_AB data?

Posted: Thu Sep 12, 2024 1:35 pm
by jianxiang_lian

Hello all,

I resume this conversation because my problem is completely related.
I am using VASP to perform AIMD simulations. I have a collection of AIMD trajectories and I want to mine them in order to train a force field (MLFF).

I followed the instructions given in the previous discussion (post from Andreas Singraber).
I created a python script to gather the required information from OUTCAR files (atomic species, number of atoms, positions, energy, forces, stress, etc.), and create a valid ML_AB file (with and without CTIFOR section). I compared my 'homemade' ML_AB file with the one from an actual MLFF simulation, and they look identical (if we discard the atomic basis sets).

After creating the ML_AB file from my OUTCAR data, I performed a MLFF calculation "from scratch", by setting ML_MODE=select in my INCAR file, and providing the generated ML_AB file.
However, it seems that the calculation only considers the very first ionic step, but not the whole trajectory. As well, the total energy in the new OUTCAR file is zero.
I cannot verify the validity of the generated ML_FFN file. But the size looks different when I compare it with the ML_FFN file generated from the actual MLFF simulation.

I am not sure what other parameter must be set in order to take into consideration the data of the whole AIMD simulation.

Here are my INCAR parameters for the MLFF training.
#Basic parameters
ISMEAR = 0
SIGMA = 0.1
LREAL = Auto
ISYM = -1
NELM = 100
EDIFF = 1E-4
LWAVE = .FALSE.
LCHARG = .FALSE.

#Parallelization of ab initio calculations
NCORE = 8

#MD
IBRION = 0
MDALGO = 2
ISIF = 2
SMASS = 1.0
TEBEG = 300
NSW = 100
POTIM = 3.0
RANDOM_SEED = 88951986 0 0

#Machine learning parameters
ML_LMLFF = .TRUE.
ML_ISTART = 3
ML_MODE = select

If you need more information on my simulation (generated ML_AB file, etc.), please, feel free to ask!
Your guidance and help will be highly appreciated!

Best regards,
JX Lian