Automated DFT¶
This lesson focuses on key concepts necessary to run automated Density Functional Theory (DFT) workflows using our atomate code. We begin by focusing on using pymatgen to build inputs and parsing outputs of DFT calculations. Then we will go through a demonstration of using atomate to run DFT calculations.
Core Concepts¶
- Writing Input Files - There is an enitre module in pymatgen devoted to reading and writing the input and output files of other codes to assist in calculation automation.
- Parsing Output Files - Similar to handling input files, pymatgen also supports parsing output files. This machinery is also used by automate to parse entire output directories.
- Database Automation - Atomate combines everything needed to automate running DFT calculations and storing them in a database.
For this lesson to work, we'll need to setup pymatgen with some files, pseudopotential that describe the charge density around each element, that typically are only available to registered VASP users. We've made some fake files with the correct structure so that pymatgen will still function.
import mp_workshop
Lesson 1: Writing Input Files¶
For the first lesson we'll focus on using pymatgen to simplfy interfacing with DFT codes. Pymatgen has a IO (input/output) module which has methods to parse and write files that are compatible with a number of external codes. These external codes are written in coding languages other than python (e.g. Fortran, C, C++, etc.) but pymatgen's IO module allows us to interface with them all through python. These external codes include:
- AbInit
- EXCITING
- FEFF
- LAMMPS
- Lobster
- QChem
- VASP
- ATAT
- Gaussian
- NWCHem
- ShengBTE
- Wannier90
- Zeo++
For today's lesson, we will focus on one of these codes, VASP (Vienna Ab initio Simulation Package), which is a DFT code for atomic scale materials modeling. VASP is the DFT code used by the Materials Project for inorganic crystals. We will also use a second DFT code, Q-Chem, in the exercises which is used by the Materials Project for molecules.
POSCAR Demonstration¶
Let's begin by obtaining a silicon structure to use in today's lesson. We will import this structure from a CIF (Crystallographic Information File) file which is not compatible with many DFT codes.
from pymatgen.core import Structure
struct = Structure.from_file(filename="./example_files/Si.CIF")
print(struct)
VASP has 4 types of input files that must be provided by the user to run a calculation: * INCAR: specifies modeling parameters (how to run the calculation) * POSCAR: provides atomic structure * KPOINTS: outlines points for sampling space in the simulation cell * POTCAR: describes how to simulate electrons via pseudopotentials
Pymatgen has python objects for representing each type of input file. For this lesson, we will first focus on the POSCAR to demonstrate the capabilties of these python objects.
from pymatgen.io.vasp.inputs import Poscar
Let's obtain a Poscar python object using the silicon structure object from before.
poscar = Poscar(structure=struct,comment="silicon")
print(poscar)
Since we can construct POSCARs from any structure object, we can make modifications to structure objects (make supercells, remove atoms to form defects, etc.) and construct complex structures (e.g. heterostructures, adsorbates on surfaces, etc.) all using pymatgen before turning the structure into a POSCAR file.
In this Poscar object there is a full pymatgen structure that we can manipulate pythonically. To demonstrate, let's introduce a defect by removing the first site. To check our work, we look at how the Poscar has changed and print out the number of sites in our original struct (8) and the Poscar structure (7).
poscar.structure.remove_sites([0])
print(poscar)
print(struct.num_sites,poscar.structure.num_sites)
When your structure is ready, in order to obtain our POSCAR file for running VASP, the Poscar object has a method for writing it out as a file.
poscar.write_file(filename="POSCAR")
Another useful method is getting a Poscar object from a POSCAR file that has already been written. Note you are able to import a compressed file without unzipping it.
poscar_from_file = Poscar.from_file(filename="./example_files/POSCAR.gz")
print(poscar_from_file)
All the python objects for VASP inputs (INCAR, POSCAR, KPOINTS, POTCAR) have similar methods for initiating the object and writing out files. See the pymatgen documentation for more details: https://pymatgen.org/_modules/pymatgen/io/vasp/inputs.html
Input Sets¶
It can still take quite some effort to explicitly generate each input file needed for calculation. To further automatation, pymatgen builds upon these pieces to generate InputSets
. InputSets are objects that provide default parameters to perform a specific kind of calcualtion. Pymatgen has several InputSets including ones with the default paramters used for MP calculations. But it is also possible to define your own InputSet that lets you build new calculations using the parameters you want.
Let's continue with our silicon structure from before. To generate the remaining input files needed for a VASP calculation, let's use an InputSet object. To perform a structure relaxation, we can import the parameters used in the Materials Project from MPRelaxSet
.
from pymatgen.io.vasp.sets import MPRelaxSet
Note all we need to initiate the MPRelaxSet InputSet is a structure object. All the information we need to generate the four VASP input files can be determined from the provided structure.
relax_set = MPRelaxSet(structure=struct)
We can access the pythonic object for each of the VASP Input files (INCAR, POSCAR, KPOINTS, POTCAR) from our InputSet object.
relax_set.incar
relax_set.poscar
relax_set.kpoints
relax_set.potcar.as_dict()
InputSet has a .write_input()
method which will write out all the VASP input files we need to run a structure optimization for our silicon structure. Let's specify writing our input files to a directory as if we were going to launch VASP from that directory.
Note we will also set the potcar_spec=True
flag to avoid writing an actual POTCAR file (which is composed from POTCAR files that can only be distrubuted under a VASP license). By default, POTCARs will be written based on POTCARs specifed by the environment variable PMG_VASP_PSP_DIR
.
relax_set.write_input(output_dir="./Si_MPRelaxSet",potcar_spec=True)
Lesson 2: Parsing Output Files¶
After VASP has been run, you'll need to parse the VASP outputs to get the data you want. VASP makes a number of output files:
- WAVECAR
- CHGCAR
- OUTCAR
- vasprun.xml
- PROCAR
- And more ...
Please refer to VASP's documentation for descriptions of what these all are.
Vasprun Demonstration¶
For this portion of the lesson, we'll focus on vasprun.xml.gz in the example_VASP_Al16Cr10 directory. Similar to how pymatgen has objects corresponding to VASP input files, pymatgen has a Vasprun
python object for parsing the outputs contained in vasprun.xml files.
from pymatgen.io.vasp.outputs import Vasprun
vrun = Vasprun(filename="./example_VASP_Al16Cr10/vasprun.xml.gz")
Since Vasprun objects have most of the information that VASP can provide, this one object is often enough to parse a full VASP calculation. Use vrun.+Tab to check out the properties from the Vasprun object.
For example, we can check if the calculation is converged as well as the the final structure and energy after the structure optimization was finished.
vrun.converged
vrun.final_structure
vrun.final_energy
Another useful set of data is stored under .ionic_steps
because this contains information on how the calculation progressed during each ionic step in the structure optimization.
We can check how many ionic steps were taken and what properties are stored for each ionic step.
len(vrun.ionic_steps)
vrun.ionic_steps[0].keys()
By looping through we can see the energy at the end of each ionic step and how many electronic steps were required during each ionic step.
for i in vrun.ionic_steps:
print(i["e_fr_energy"] ,len(i["electronic_steps"]))
Parsing Directories with Atomate Drones¶
To take parsing outputs from external codes one step further, we will introduce the concept of drones
from the python code package, atomate
. Pymatgen provides methods for parsing individual output files. Atomate drones combine these capabilities to parse entire output calculation directories and has an .assimilate()
methode to produce a dictionary summarizing the results. This dictionary representation is helpful because it can be stored in a database as done with the Materials Project.
from atomate.vasp.drones import VaspDrone
drone = VaspDrone()
task_doc = drone.assimilate(path="./example_VASP_Al16Cr10")
print(task_doc.keys())
Lesson 3: Database Automation¶
The final section of this lesson focuses on automating DFT using our atomate
code. atomate
is a set of recipes for computing properties for both molecules and structures. The workflows in atomate
run on fireworks
, our workflow management software. fireworks
stores workflow information and calculation summaries in MongoDB. Using this infrastructure MP routinely manages 10,000 simultaneous calculations on supercomputers such as Cori at NERSC.
Getting Workflows¶
Let's begin by importing a basic silicon structure
from pymatgen.core import Structure
si = Structure.from_file("./example_files/Si.CIF")
print(si)
In the past two lessons we've gone over some of the machinery used in this automation. Atomate workflows (with the support of fireworks) compiles everything needed to: * Store workflow inputs from the user * Create new calculation directories and write input files * Parse output directories and store results in a MongoDB database
We can import workflows
from atomate which are essentially the recipes that outline the steps for how to automate calculations from start to finish. There are many pre-built workflows in automate for common types of calculations. The best way to explore all available workflows is by checking the atomate source code linked below. * VASP Workflows: https://github.com/hackingmaterials/atomate/tree/main/atomate/vasp/workflows/base * Q-Chem Workflows: https://github.com/hackingmaterials/atomate/tree/main/atomate/qchem/workflows/base
For this lesson, we will focus on two types of workflows for VASP. * Structure Optimization Workflow (simpler) * Bandstructure Workflow (more complex)
Let's get started by importing these workflows and then creating workflow objects from our silicon structure.
from atomate.vasp.workflows.presets.core import wf_structure_optimization, wf_bandstructure
so_wf = wf_structure_optimization(structure=si)
print(so_wf)
bs_wf = wf_bandstructure(structure=si)
print(bs_wf)
The structure optimization workflow is simpler and we can see that it only contains one firework. The bandstructure workflow has four fireworks so we can assume it is more complex. Let's see a graphical representation of each of these workflows.
from mp_workshop.atomate import wf_to_graph
wf_to_graph(so_wf)
wf_to_graph(bs_wf)
The exact steps in each firework can vary but for these cases, each firework roughly represents one VASP calculation which consists of the following steps:
1. Write VASP input files
2. Run VASP
3. Parse VASP output files
In the more complex bandstructure workflow, we can see the fireworks are connected. That means the outputs from one firework are passed to the next firework so the VASP calculation can start from the relaxed structure of the previous calculation.
Before we move forward with running our workflows, we will make one modification called use_fake_vasp_workshop
. This will change the fireworks in the workflows so they will not actually run VASP (which requires greater computer resources and a VASP license), but instead will simulate running VASP by copying over pre-existing VASP output files.
from mp_workshop.atomate import use_fake_vasp_workshop
so_wf = use_fake_vasp_workshop(so_wf)
bs_wf = use_fake_vasp_workshop(bs_wf)
Launching Workflows from Your LaunchPad¶
Now we will introduce the LaunchPad
from fireworks which is a python object that let's us interface with MongoDB and manage our queue of calculations. The LaunchPad allows you to submit and query workflows from anywhere you have database access. We need to get ourselves a LaunchPad object and set it up so we can submit our workflows.
from fireworks import LaunchPad
lp = LaunchPad.auto_load()
lp.reset(password=None,require_password=False)
Note lp.reset()
should only be executed one time when you are first initializing your database set-up. If you reset your LaunchPad at a later time, you will erase the record of past calcuations you have run.
Adding our workflows to our Launchpad will encode and store our workflow objects into MongoDB so they can be accessed later. Let's start by adding one workflow, the simpler structure optimization workflow.
lp.add_wf(so_wf)
Now that a workflow is in our LaunchPad, we can check on its status using .get_wf_summary_dict()
lp.get_wf_summary_dict(fw_id=1)
Now that we a workflow in our LaunchPad, we are ready to run it. Normally we run fireworks through the command-line interface on supercomputers because external codes, such as VASP, require more compute resources. However, for today's demonstration, we will run our workflows locally in this notebook to illustrate how it works.
Fireworks has a command called rlaunch
for "launching" or running a firework in a workflow from our LaunchPad. Let's see how to use it and use "rapidfire" mode.
!rlaunch --help
!rlaunch rapidfire
Let's check on our workflow in our LaunchPad to see how it has changed after launching one firework.
lp.get_wf_summary_dict(fw_id=1)
Great! We can see the state of the one firework in our workflow has changed to COMPLETED and now the workflow is done.
Now let's try running the more complicated bandstructure workflow which has four fireworks. First let's add the workflow to our LaunchPad and look at its summary dictionary.
lp.add_wf(bs_wf)
lp.get_wf_summary_dict(fw_id=2)
Now let's move forward with running the workflow. Note that if the --nlaunches
flag is not used, rlaunch rapidfire
will continue to launch more fireworks from the LaunchPad until no "READY" fireworks remain. This is convenient for our case because we have 4 fireworks in our workflow that need to run.
!rlaunch rapidfire
Once rlaunch rapidfire
has finished running, now we can check to see if this workflow completed successfully.
lp.get_wf_summary_dict(fw_id=2)
Fireworks also has a command called qlaunch
to automatically submit jobs to the supercomputer to run. Upon the job starting on the supercomputer, rlaunch from fireworks will be called thus running fireworks with supercomputer resources. qlaunch is the primary command used by users in the command-line interface on supercomputers.