Skip to content

Materials Project Workshop – July 28th to July 30th, 2019, Berkeley, California

pymatgen logo

What is pymatgen?

Pymatgen (Python Materials Genomics) is the code that powers all of the scientific analysis behind the Materials Project. It includes a robust and efficient libraries for the handling of crystallographic structures and molecules, in addition to various mathematical and scientific tools for the handling and generation of materials data. For more details, see the pymatgen website.

How do I install pymatgen?

For the workshop, pymatgen has been pre-installed for use in your Jupyter notebooks.

Otherwise, pymatgen can be installed via pip:

pip install pymatgen

or conda:

conda install --channel matsci pymatgen

We recommend using Python 3.6 or above. Until 2018, pymatgen was developed simultaneously for Python 2.x and 3.x, but following the rest of the Python community we have phased out support for Python 2.x, and since version 2019.1.1 we are developing exclusively for Python 3.x.

Where can I find help and how do I get involved?

  • For general help: pymatgen discourse is a place to ask questions.

  • To report bugs: The Github Issues page is a good place to report bugs.

  • For Materials Project data and website discussions: The Materials Project has its community Materials Project Discussion forum.

  • For more example notebooks: matgenb is a new resource of Jupyter notebooks demonstrating various pymatgen functionality.

If you want specific new features, you're welcome to ask! We try to respond to community needs. If you're a developer and can add the feature yourself, we actively encourage you to do so by creating a Pull Request on Github with your additional functionality. To date, pymatgen has seen over 19,000 commits and nearly 150 contributors, and we try to have an inclusive and welcoming development community. All contributors are also individually acknowledged on materialsproject.org/about.

Verify we have pymatgen installed

First, let's verify we have pymatgen installed. The following command should produce no error or warning:

import pymatgen

We can show the specific version of pymatgen installed:

print(pymatgen.__version__)
2020.7.16

For a list of new features, bug fixes and other changes, consult the changelog on pymatgen.org.

You can also see where pymatgen is installed on your computer:

print(pymatgen.__file__)
/Users/shyamd/Dropbox/Codes/pymatgen/pymatgen/__init__.py

We can also see which version of the Python programming language we are using:

import sys
print(sys.version)
3.8.3 (default, Jul  2 2020, 11:26:31) 
[Clang 10.0.0 ]

If you have problems or need to report bugs when using pymatgen after the workshop, the above information is often very useful to help us identify the problem.

Structures and Molecules

Most of the fundamentals of pymatgen are expressed in terms of Molecule and Structure objects.

While we will mostly be using Structure, Stucture and Molecule are very similar conceptually. The main difference is that Structure supports full periodicity required to describe crystallographic structures.

Creating a Structure can be done in one line, even for complicated crystallographic structures. However, we'll start by introducing the somewhat simpler Molecule object, and then use this understanding of Molecule to introduce Structure.

Creating a Molecule

Start by importing Molecule:

from pymatgen import Molecule

In a Jupyter notebook, you can show help for any Python object by clicking on the object and pressing Shift+Tab. This will give you a list of arguments and keyword arguments necessary to construct the object, as well as the documentation ('docstring') which gives more information on what each argument means.

Molecule
pymatgen.core.structure.Molecule

Molecule takes input arguments species and coords, and input keyword arguments charge, spin_multiplicity, validate_proximity and site_properties.

Keyword arguments come with a default value (the value after the equals sign), and so keyword arguments are optional.

Arguments (without default values) are mandatory.

c_monox = Molecule(["C","O"], [[0.0, 0.0, 0.0], [0.0, 0.0, 1.2]])
print(c_monox)
Full Formula (C1 O1)
Reduced Formula: CO
Charge = 0.0, Spin Mult = 1
Sites (2)
0 C     0.000000     0.000000     0.000000
1 O     0.000000     0.000000     1.200000

Alright, now let's use a keyword variable to change a default. How about we make an anion?

oh_minus = Molecule(["O", "H"], [[0.0, 0.0, 0.0], [0.0, 0.0, 1.0]], charge=-1)
print(oh_minus)
Full Formula (H1 O1)
Reduced Formula: H2O2
Charge = -1, Spin Mult = 1
Sites (2)
0 O     0.000000     0.000000     0.000000
1 H     0.000000     0.000000     1.000000

You can also create Molecule objects from files. Let's say you have an *.xyz file called "water.xyz". You can import that into pymatgen with Molecule.from_file, like:

water = Molecule.from_file("water.xyz")
print(water)
Full Formula (H2 O1)
Reduced Formula: H2O
Charge = 0, Spin Mult = 1
Sites (3)
0 O    -0.070000    -0.026960    -0.095240
1 H     0.919330    -0.015310    -0.054070
2 H    -0.359290     0.231000     0.816010

Exercise: Making Molecules

Try it yourself! Create molecules however you like!

In this folder are several example molecules (methane.xyz, furan.xyz, and benzene.xyz). Try loading these files with Molecule.from_file. You can also try making a Molecule from a list of species and coordinates. Try changing the default parameters - see what you can and cannot do (for instance, look at varying the charge and the spin multiplicity).

What's in a Molecule? Introducing Sites, Elements and Species

You can access properties of the molecule, such as the Cartesian coordinates of its sites:

print(c_monox.cart_coords)
[[0.  0.  0. ]
 [0.  0.  1.2]]

or properties that are computed on-the-fly, such as its center of mass:

print(c_monox.center_of_mass)
[0.         0.         0.68544132]

To see the full list of available properties and methods, press Tab after typing my_molecule. in your Jupyter notebook. There are methods used to modify the molecule and these take additional argument(s). For example, to add a charge to the molecule:

c_monox.set_charge_and_spin(charge=1)
print(c_monox)
Full Formula (C1 O1)
Reduced Formula: CO
Charge = 1, Spin Mult = 2
Sites (2)
0 C     0.000000     0.000000     0.000000
1 O     0.000000     0.000000     1.200000

A molecule is essentially a list of Site objects. We can access these sites like we would a list in Python. For example, to obtain the total number of sites in the molecule:

len(c_monox)
2

Or to access the first site (note that Python is a 0-indexed programming language, so the first site is site 0):

print(c_monox[0])
[0. 0. 0.] C

And just like a list, I can even change the elements of a molecule.

c_monox[0] = "O"
c_monox[1] = "C"
print(c_monox)
Full Formula (C1 O1)
Reduced Formula: CO
Charge = 1, Spin Mult = 2
Sites (2)
0 O     0.000000     0.000000     0.000000
1 C     0.000000     0.000000     1.200000

A site object contains information on the site's identity and position in space.

site0 = c_monox[0]
site0.coords
array([0., 0., 0.])
site0.specie
Element O

Here, because we switched the elements, the site holds the element O. In general, a site can hold an Element, a Specie or a Composition. Let's look at each of these in turn.

from pymatgen import Element, Specie, Composition

An Element is simply an element from the Periodic Table.

carbon = Element('C')

Elements have properties such as atomic mass, average ionic radius and more:

carbon.average_ionic_radius
0.3

A Specie can contain additional information, such as oxidation state:

o_ion = Specie('O', oxidation_state=-2)
o_ion
Specie O2-

Again, we can access both these Specie-specific properties and more general properties of Elements.

o_ion.oxi_state
-2
o_ion.atomic_mass
15.9994

Or, for convenience, we can use strings, which will be interpreted as elements with oxidation states:

Specie.from_string('O2-')
Specie O2-

Finally, a Composition is an object that can hold certain amounts of different elements or species. This is most useful in a disordered Structure, and would rarely be used in a Molecule. For example, a site that holds 50% Au and 50% Cu would be set as follows:

comp = Composition({'Au': 0.5, 'Cu': 0.5})

A Composition contains more information than either an Element or a Specie. Because it can contain multiple Elements/Species, you can obtain the formula, or the chemical system.

print("formula", comp.alphabetical_formula)
print("chemical system", comp.chemical_system)
formula Au0.5 Cu0.5
chemical system Au-Cu

When we construct a Molecule, the input argument will automatically be converted into one of Element, Specie or Composition. Thus, in the previous example, when we first defined carbon monoxide, the input ['C', 'O'] was converted to [Element C, Element O].

Exercise: Hydrogen Cyanide

Construct the linear HCN molecule where each bond distance is the sum of the two atomic radii.

HINT: To do this, you'll probably want to use some Element properties!

H_rad = Element('H').atomic_radius
C_rad = Element('C').atomic_radius
N_rad = Element('N').atomic_radius
HC_bond_dist = H_rad + C_rad
CN_bond_dist = C_rad + N_rad
H_pos = 0
C_pos = H_pos + HC_bond_dist
N_pos = C_pos + CN_bond_dist
hcn = Molecule(['H','C','N'], [[H_pos, 0, 0], [C_pos, 0, 0],[N_pos, 0, 0]])
print(hcn)
Full Formula (H1 C1 N1)
Reduced Formula: HCN
Charge = 0.0, Spin Mult = 1
Sites (3)
0 H     0.000000     0.000000     0.000000
1 C     0.950000     0.000000     0.000000
2 N     2.300000     0.000000     0.000000

Creating a Structure and Lattice

Creating a Structure is very similar to creating a Molecule, except we now also have to specify a Lattice.

from pymatgen import Lattice, Structure

A Lattice can be created in one of several ways, such as by inputting a 3x3 matrix describing the individual lattice vectors. For example, a cubic lattice of length 5 Ångstrom:

my_lattice = Lattice([[5, 0, 0], [0, 5, 0], [0, 0, 5]])
print(my_lattice)
5.000000 0.000000 0.000000
0.000000 5.000000 0.000000
0.000000 0.000000 5.000000

Equivalently, we can create it from its lattice parameters:

my_lattice_2 = Lattice.from_parameters(5, 5, 5, 90, 90, 90)

Or, since we know in this case that we have a cubic lattice, a == b == c and alpha == beta == gamma == 90 degrees, so we can simply put:

my_lattice_3 = Lattice.cubic(5)

We can confirm that these lattices are the same:

my_lattice == my_lattice_2 == my_lattice_3
True

Now, we can create a simple crystal structure. Let's start with body-centered-cubic iron:

bcc_fe = Structure(Lattice.cubic(2.8), ["Fe", "Fe"], [[0, 0, 0], [0.5, 0.5, 0.5]])
print(bcc_fe)
Full Formula (Fe2)
Reduced Formula: Fe
abc   :   2.800000   2.800000   2.800000
angles:  90.000000  90.000000  90.000000
Sites (2)
  #  SP      a    b    c
---  ----  ---  ---  ---
  0  Fe    0    0    0
  1  Fe    0.5  0.5  0.5

print(bcc_fe)
Full Formula (Fe2)
Reduced Formula: Fe
abc   :   2.800000   2.800000   2.800000
angles:  90.000000  90.000000  90.000000
Sites (2)
  #  SP      a    b    c
---  ----  ---  ---  ---
  0  Fe    0    0    0
  1  Fe    0.5  0.5  0.5

Creating this Structure was similar to creating a Molecule in that we provided a list of elements and a list of positions. However, there are two key differences: we had to include our Lattice object when creating the Structure and since we have a lattice, we can define the positions of our sites in fractional coordinates with respect to that lattice instead of Cartesian coordinates.

It's also possible to create an equivalent Structure using Cartesian coordinates:

bcc_fe_from_cart = Structure(Lattice.cubic(2.8), ["Fe", "Fe"], [[0, 0, 0], [1.4, 1.4, 1.4]],
                             coords_are_cartesian=True)
print(bcc_fe_from_cart)
Full Formula (Fe2)
Reduced Formula: Fe
abc   :   2.800000   2.800000   2.800000
angles:  90.000000  90.000000  90.000000
Sites (2)
  #  SP      a    b    c
---  ----  ---  ---  ---
  0  Fe    0    0    0
  1  Fe    0.5  0.5  0.5

We can check that both structures are equivalent:

bcc_fe == bcc_fe_from_cart
True

As in a Molecule, we can access properties of the structure, such as its volume:

bcc_fe.volume
21.951999999999995

Creating Structures from Spacegroups

Structures can also be created directly from their spacegroup:

bcc_fe = Structure.from_spacegroup("Im-3m", Lattice.cubic(2.8), ["Fe"], [[0, 0, 0]])
print(bcc_fe)
Full Formula (Fe2)
Reduced Formula: Fe
abc   :   2.800000   2.800000   2.800000
angles:  90.000000  90.000000  90.000000
Sites (2)
  #  SP      a    b    c
---  ----  ---  ---  ---
  0  Fe    0    0    0
  1  Fe    0.5  0.5  0.5

nacl = Structure.from_spacegroup("Fm-3m", Lattice.cubic(5.692), ["Na+", "Cl-"],
                                 [[0, 0, 0], [0.5, 0.5, 0.5]])
print(nacl)
Full Formula (Na4 Cl4)
Reduced Formula: NaCl
abc   :   5.692000   5.692000   5.692000
angles:  90.000000  90.000000  90.000000
Sites (8)
  #  SP      a    b    c
---  ----  ---  ---  ---
  0  Na+   0    0    0
  1  Na+   0    0.5  0.5
  2  Na+   0.5  0    0.5
  3  Na+   0.5  0.5  0
  4  Cl-   0.5  0.5  0.5
  5  Cl-   0.5  0    0
  6  Cl-   0    0.5  0
  7  Cl-   0    0    0.5

And spacegroups can be obtained from a structure:

nacl.get_space_group_info()
('Fm-3m', 225)

Where 225 is the spacegroup number.

Supercells

Alright, we are now well-versed in the art of creating singular structures. But in some cases, you really don't just want one unit cell; you want a supercell. Pymatgen provides a very simple interface to create superstructures. Let's start with the simplest structure that we can imagine: Polonium, a simple cubic metal.

polonium = Structure(Lattice.cubic(3.4), ["Po"], [[0.0, 0.0, 0.0]])

print(polonium)
Full Formula (Po1)
Reduced Formula: Po
abc   :   3.400000   3.400000   3.400000
angles:  90.000000  90.000000  90.000000
Sites (1)
  #  SP      a    b    c
---  ----  ---  ---  ---
  0  Po      0    0    0

To make a supercell, we can just multiply the structure by a tuple!

supercell = polonium * (2, 2, 2)
print(supercell)
Full Formula (Po8)
Reduced Formula: Po
abc   :   6.800000   6.800000   6.800000
angles:  90.000000  90.000000  90.000000
Sites (8)
  #  SP      a    b    c
---  ----  ---  ---  ---
  0  Po    0    0    0
  1  Po    0    0    0.5
  2  Po    0    0.5  0
  3  Po    0    0.5  0.5
  4  Po    0.5  0    0
  5  Po    0.5  0    0.5
  6  Po    0.5  0.5  0
  7  Po    0.5  0.5  0.5

Exercise: Barium Titanate

Load BaTiO3 from the CIF file provided (BaTiO3.cif). Replace the barium with a new element with atomic number equal to the the day of the month you were born (e.g. 1st = hydrogen, 31st = gallium). Then, make a supercell of N unit cells in one cartesian direction of your choice where N is the integer of your birth month (e.g. January = 1, December = 12).

BaTiO3=Structure.from_file("BaTiO3.cif")
print(BaTiO3.get_space_group_info())
BaTiO3.replace(0,'Mg')
BaTiO3=BaTiO3*(1,1,4)
print(BaTiO3)
print(BaTiO3.get_space_group_info())
('R3m', 160)
Full Formula (Mg4 Ti4 O12)
Reduced Formula: MgTiO3
abc   :   4.077159   4.077159  16.308637
angles:  89.699022  89.699022  89.699022
Sites (20)
  #  SP           a         b         c
---  ----  --------  --------  --------
  0  Mg    0.497155  0.497155  0.124289
  1  Mg    0.497155  0.497155  0.374289
  2  Mg    0.497155  0.497155  0.624289
  3  Mg    0.497155  0.497155  0.874289
  4  Ti    0.982209  0.982209  0.245552
  5  Ti    0.982209  0.982209  0.495552
  6  Ti    0.982209  0.982209  0.745552
  7  Ti    0.982209  0.982209  0.995552
  8  O     0.524065  0.012736  0.003184
  9  O     0.524065  0.012736  0.253184
 10  O     0.524065  0.012736  0.503184
 11  O     0.524065  0.012736  0.753184
 12  O     0.012736  0.012736  0.131016
 13  O     0.012736  0.012736  0.381016
 14  O     0.012736  0.012736  0.631016
 15  O     0.012736  0.012736  0.881016
 16  O     0.012736  0.524065  0.003184
 17  O     0.012736  0.524065  0.253184
 18  O     0.012736  0.524065  0.503184
 19  O     0.012736  0.524065  0.753184
('R3m', 160)

Summary

This notebook is intended to provide a short introduction to some of the functionality of pymatgen. We've examined the building blocks of pymatgen: the Structure and Molecule objects, as well as the Lattice, Element, Specie and Composition objects that they're made up of.